Sr. Data Scientist Resume
CA
PROFESSIONAL SUMMARY:
- Over 10 + Years of Overall IT Experience as Data Scientist/Machine Learning and Data Warehouse applications using Informatica, Oracle and Teradata
- Proficient in advising on the use of data for compiling personnel and statistical reports and preparing personnel action documents patterns within data, analyzing data and interpreting results
- Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles
- Skilled in Advanced Regression Modeling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts
- Proficient in: Data Acquisition, Storage, Analysis, Integration, Predictive Modeling, Logistic Regression, Decision Trees, Data Mining Methods, Forecasting, Factor Analysis, Cluster Analysis, Neural Networks and other advanced statistical and econometric techniques
- Adept in writing code in R and T - SQL scripts to manipulate data for data loads and extracts
- Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy
- Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing
- Strong experience with R Visualization, QlikView and Tableau to use in data analytics and graphic visualization
- Extensively worked on using major statistical analysis tools such as R, SQL, SAS, and MATLAB
- Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance with timely delivery against deadlines
- Good knowledge and understanding of data mining techniques like classification, clustering, regression techniques and random forests
- Extensive experience with creating MapReduce jobs, SQL on Hadoop using Hive and ETL using PIG scripts, and Flume for transferring unstructured data to HDFS
- Strong Oracle/SQL Server programming skills, with experience in working with functions, packages and triggers
- Experience in all phases of Data warehouse development from Requirements, analysis, design, development, testing and post production support
- Strong in-depth knowledge in doing data analysis, data quality and source system analysis.
- Independent, Self-starter, enthusiastic team player with strong adaptability to new technologies
- Experience in Big Data Technologies using Hadoop, Sqoop, Pig and Hive.
- Experience in writing Hive and Unix shell scripts
- Excellent track record in delivering quality software on time to meet the business priorities.
- Developed Data Warehouse/Data Mart systems, using various RDBMS (Oracle, MS-SQL Server, Mainframes, Teradata and DB2)
- Highly Proficient in using Informatica Power Center, Power Exchange and explore on Informatica Data Services.
TECHNICAL SKILLS:
Programming Skills: R language, Python, PL/SQL
Databases: Teradata 12/13/14, Oracle 9i/10g/11g/12c, MySQL, SQL Server 2000/2005, MS Access, DB2, Hadoop (HDFS)
Libraries: Scikit-learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2
Operating Systems: Windows, Unix, Linux
Web Related: ASP.NET, VB Script, HTML, DHTML, JAVA, Java Script
Tools: Teradata Parallel Transporter, Aprimo 6.1/8.X, Bteq, SQL Assistant, Toad, SQL Navigator, SQL*Loader, $U, HP Quality center, PVCS, Data Flux, UC4, Control-M
Domain Knowledge: Banking, Finance, Insurances, Health Care, Energy
PROFESSIONAL EXPERIENCE:
Confidential, CA
Sr. Data Scientist
Responsibilities:
- This project was focused on customer segmentation based on machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation
- Develop a pricing model for various product & services bundled offering to optimize and predict the gross margin
- Built price elasticity model for various product and services bundled offering
- Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering
- Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning
- Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions prototyping and experimenting ML/DL algorithms and integrating into production system for different business needs
- Worked on Multiple datasets containing 2billion values which are structured and unstructured data about web applications usage and online customer surveys
- Good hands on experience on Amazon Redshift platform
- Design, built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction
and support multiple marketing segmentation programs
- Segmented the customers based on demographics using K-means Clustering
- Explored different regression and ensemble models in machine learning to perform forecasting
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
- Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI
Environment: MS SQL Server, R/R studio, Python, Redshift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS office 2007, Outlook.
Confidential, Sacramento, CA
Data Scientist
Responsibilities:
- Analyze and Prepare data, identify the patterns on dataset by applying historical models
- Collaborating with Senior Data Scientists for understanding of data
- Perform data manipulation, data preparation, normalization, and predictive modeling
- Improve efficiency and accuracy by evaluating model in R
- Present the existing model to stockholders, give insights for model by using different visualization methods in Power BI
- Used R and Python for programming for improvement of model
- Upgrade the entire models for improvement of the product
- Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values
- Under supervision of Sr. Data Scientist performed Data Transformation method for Rescaling and Normalizing Variables
- Developed a predictive model and validate Neural Network Classification model for predict the feature label
- Performed Boosting method on predicted model for the improve efficiency of the model
- Presented Dashboards to Higher Management for more Insights using Power BI
Environment: R/R Studio, Python, SQL Enterprise Manager, Git Hub, Microsoft Power BI, outlook.
Confidential, CA
Data Scientist
Responsibilities:
- Used various approaches to collect the business requirements and worked with the business users for ETL application enhancements by conducting various JRD sessions to meet the job requirements
- Designed data profiles for processing, including running PL/SQL queries and using R for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks
- Performed exploratory data analysis like calculation of descriptive statistics, detection of outliers, assumptions testing, factor analysis, etc., in R
- Conducted data/statistical analysis, generated Transaction Performance Report on monthly and quarterly basis for all the transactional data from U.S., Canada, and Latin America Markets using SQL server and BI tools such as Report services and Integrate services (SSRS and SSIS)
- Used R to generate regression models to provide statistical forecasting
- Applied Clustering Algorithms such as K-Means to categorize customers into certain groups
- Implemented Key Performance Indicator (KPI) Objects, Actions, Hierarchies and Attribute Relationships for added functionality and better performance of SSAS Warehouse
- Used Tableau and designed various charts and tables for data analysis and creating various analytical Dashboards to showcase the data to managers
- Performed data management, including creating SQL Server Report Services to develop reusable code and an automatic reporting system and designed user acceptance test to provide end with an opportunity to give constructive feedback
Environment: R/R Studio, SAS, Oracle Database 11g, Oracle BI tools, Tableau, MS-Excel
Confidential, CA
ETL and Teradata Developer
Responsibilities:
- Analysis, Design, Development, Testing and Deployment of Informatica workflows, BTEQ scripts, Python and shell scripts.
- Source System Analysis and provide input to data modeling, and developing ETL design document as per business requirements.
- Design, Developing and testing of the various Mappings and Mapplets, worklets and workflows involved in the ETL process.
- Developed and Integrated Data Quality measures into ETL frame work using Informatica Data Quality ( IDQ ).
- Experience in data profiling using IDQ for input into ETL Design and Data Modelling.
- Extensively used ETL to transfer data from different source system and load the data into the target DB.
- Developing Informatica mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer.
- Extracting data from various sources across the organization (Oracle, MySQL, SQL Server and Flat files) and loading into staging area.
Environment: Teradata, Oracle, PL/SQL, MySQL, Informatica Power Center, Power Exchange, IDQ, OCL Tool, UC4, Control-M, ER Viewer, Business Intelligence, Windows, HP Quality center, Unix, Linux.
Confidential, Annapolis, MD
ETL Developer
Responsibilities:
- Developed Low level mappings for Tables and columns from source to target systems.
- Wrote and optimized Initial data load scripts using Information and Database utilities.
- Using Partitions to extract data from source and load it to Teradata using TPT load with proper load balance on Teradata server.
- Wrote Complex Bteq scripts to in corporate Business functionality in transforming the data from Staging into 3rd normal form.
- Participated in Teradata Upgrade project to upgrade from TD12 to TD13.10 to conduct regression testing.
Environment: Teradata, Oracle, PL/SQL, MySQL, Informatica Power Center, SSIS, SSRS, ER Viewer, Windows, HP Quality center, UNIX.
Confidential, Owings Mills, MD
Senior ETL Developer
Responsibilities:
- Created Uprocs, Sessions, Management Unit to schedule jobs using $U.
- Conduct source System Analysis and developed ETL design document to meet business requirements.
- Tuned Teradata Sql queries and resolved performance issues due to Data Skew and Spool space issues.
- Developed Flat files from Teradata using fast export, Bteq to disseminate to downstream dependent systems.
Environment: Teradata, Oracle, PL/SQL, Informatica Power Center, $U, Business Objects, SSIS, Windows XP, UNIX Shell scripting.
Confidential, Temple, TX
ETL Developer
Responsibilities:
- Documenting functional specifications and other aspects used for the development of ETL mappings
- Design, Developing and testing of the various Mappings and Mapplets, worklets and
- Optimized Performance of existing Informatica workflows.
- Involved in fixing invalid Mappings, testing of Stored Procedures and Functions, Unit and Integration Testing of Informatica Sessions, Batches and the Target Data.
Environment: Oracle, SQL Server, DB2, Informatica Power Center, Erwin, Cognos, XML, Windows, Unix
Confidential, Minnesota, MN
ETL Developer
Responsibilities:
- Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer
- Extracted data from various sources across the organization (Oracle, SQL Server and Flat files) and loading into staging area
- Created and scheduled Sessions and Batch Process based on demand, run on time, or run only once using Informatica Workflow Manager and monitoring the data loads using the Workflow Monitor
Environment: Oracle, SQL Server, PL/SQL, Informatica Power Center, Erwin, Cognos, Windows, UNIX