Sr. Data Scientist Resume CA - Hire IT People

PROFESSIONAL SUMMARY:

Over 10 + Years of Overall IT Experience as Data Scientist/Machine Learning and Data Warehouse applications using Informatica, Oracle and Teradata
Proficient in advising on the use of data for compiling personnel and statistical reports and preparing personnel action documents patterns within data, analyzing data and interpreting results
Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles
Skilled in Advanced Regression Modeling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts
Proficient in: Data Acquisition, Storage, Analysis, Integration, Predictive Modeling, Logistic Regression, Decision Trees, Data Mining Methods, Forecasting, Factor Analysis, Cluster Analysis, Neural Networks and other advanced statistical and econometric techniques
Adept in writing code in R and T - SQL scripts to manipulate data for data loads and extracts
Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy
Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing
Strong experience with R Visualization, QlikView and Tableau to use in data analytics and graphic visualization
Extensively worked on using major statistical analysis tools such as R, SQL, SAS, and MATLAB
Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance with timely delivery against deadlines
Good knowledge and understanding of data mining techniques like classification, clustering, regression techniques and random forests
Extensive experience with creating MapReduce jobs, SQL on Hadoop using Hive and ETL using PIG scripts, and Flume for transferring unstructured data to HDFS
Strong Oracle/SQL Server programming skills, with experience in working with functions, packages and triggers
Experience in all phases of Data warehouse development from Requirements, analysis, design, development, testing and post production support
Strong in-depth knowledge in doing data analysis, data quality and source system analysis.
Independent, Self-starter, enthusiastic team player with strong adaptability to new technologies
Experience in Big Data Technologies using Hadoop, Sqoop, Pig and Hive.
Experience in writing Hive and Unix shell scripts
Excellent track record in delivering quality software on time to meet the business priorities.
Developed Data Warehouse/Data Mart systems, using various RDBMS (Oracle, MS-SQL Server, Mainframes, Teradata and DB2)
Highly Proficient in using Informatica Power Center, Power Exchange and explore on Informatica Data Services.

TECHNICAL SKILLS:

Programming Skills: R language, Python, PL/SQL

Databases: Teradata 12/13/14, Oracle 9i/10g/11g/12c, MySQL, SQL Server 2000/2005, MS Access, DB2, Hadoop (HDFS)

Libraries: Scikit-learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2

Operating Systems: Windows, Unix, Linux

Web Related: ASP.NET, VB Script, HTML, DHTML, JAVA, Java Script

Tools: Teradata Parallel Transporter, Aprimo 6.1/8.X, Bteq, SQL Assistant, Toad, SQL Navigator, SQL*Loader, $U, HP Quality center, PVCS, Data Flux, UC4, Control-M

Domain Knowledge: Banking, Finance, Insurances, Health Care, Energy

PROFESSIONAL EXPERIENCE:

Confidential, CA

Sr. Data Scientist

Responsibilities:

This project was focused on customer segmentation based on machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation
Develop a pricing model for various product & services bundled offering to optimize and predict the gross margin
Built price elasticity model for various product and services bundled offering
Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering
Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning
Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions prototyping and experimenting ML/DL algorithms and integrating into production system for different business needs
Worked on Multiple datasets containing 2billion values which are structured and unstructured data about web applications usage and online customer surveys
Good hands on experience on Amazon Redshift platform
Design, built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction
and support multiple marketing segmentation programs
Segmented the customers based on demographics using K-means Clustering
Explored different regression and ensemble models in machine learning to perform forecasting
Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI

Environment: MS SQL Server, R/R studio, Python, Redshift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS office 2007, Outlook.

Confidential, Sacramento, CA

Data Scientist

Responsibilities:

Analyze and Prepare data, identify the patterns on dataset by applying historical models
Collaborating with Senior Data Scientists for understanding of data
Perform data manipulation, data preparation, normalization, and predictive modeling
Improve efficiency and accuracy by evaluating model in R
Present the existing model to stockholders, give insights for model by using different visualization methods in Power BI
Used R and Python for programming for improvement of model
Upgrade the entire models for improvement of the product
Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values
Under supervision of Sr. Data Scientist performed Data Transformation method for Rescaling and Normalizing Variables
Developed a predictive model and validate Neural Network Classification model for predict the feature label
Performed Boosting method on predicted model for the improve efficiency of the model
Presented Dashboards to Higher Management for more Insights using Power BI

Environment: R/R Studio, Python, SQL Enterprise Manager, Git Hub, Microsoft Power BI, outlook.

Confidential, CA

Data Scientist

Responsibilities:

Used various approaches to collect the business requirements and worked with the business users for ETL application enhancements by conducting various JRD sessions to meet the job requirements
Designed data profiles for processing, including running PL/SQL queries and using R for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks
Performed exploratory data analysis like calculation of descriptive statistics, detection of outliers, assumptions testing, factor analysis, etc., in R
Conducted data/statistical analysis, generated Transaction Performance Report on monthly and quarterly basis for all the transactional data from U.S., Canada, and Latin America Markets using SQL server and BI tools such as Report services and Integrate services (SSRS and SSIS)
Used R to generate regression models to provide statistical forecasting
Applied Clustering Algorithms such as K-Means to categorize customers into certain groups
Implemented Key Performance Indicator (KPI) Objects, Actions, Hierarchies and Attribute Relationships for added functionality and better performance of SSAS Warehouse
Used Tableau and designed various charts and tables for data analysis and creating various analytical Dashboards to showcase the data to managers
Performed data management, including creating SQL Server Report Services to develop reusable code and an automatic reporting system and designed user acceptance test to provide end with an opportunity to give constructive feedback

Environment: R/R Studio, SAS, Oracle Database 11g, Oracle BI tools, Tableau, MS-Excel

Confidential, CA

ETL and Teradata Developer

Responsibilities:

Analysis, Design, Development, Testing and Deployment of Informatica workflows, BTEQ scripts, Python and shell scripts.
Source System Analysis and provide input to data modeling, and developing ETL design document as per business requirements.
Design, Developing and testing of the various Mappings and Mapplets, worklets and workflows involved in the ETL process.
Developed and Integrated Data Quality measures into ETL frame work using Informatica Data Quality ( IDQ ).
Experience in data profiling using IDQ for input into ETL Design and Data Modelling.
Extensively used ETL to transfer data from different source system and load the data into the target DB.
Developing Informatica mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer.
Extracting data from various sources across the organization (Oracle, MySQL, SQL Server and Flat files) and loading into staging area.

Environment: Teradata, Oracle, PL/SQL, MySQL, Informatica Power Center, Power Exchange, IDQ, OCL Tool, UC4, Control-M, ER Viewer, Business Intelligence, Windows, HP Quality center, Unix, Linux.

Confidential, Annapolis, MD

ETL Developer

Responsibilities:

Developed Low level mappings for Tables and columns from source to target systems.
Wrote and optimized Initial data load scripts using Information and Database utilities.
Using Partitions to extract data from source and load it to Teradata using TPT load with proper load balance on Teradata server.
Wrote Complex Bteq scripts to in corporate Business functionality in transforming the data from Staging into 3rd normal form.
Participated in Teradata Upgrade project to upgrade from TD12 to TD13.10 to conduct regression testing.

Environment: Teradata, Oracle, PL/SQL, MySQL, Informatica Power Center, SSIS, SSRS, ER Viewer, Windows, HP Quality center, UNIX.

Confidential, Owings Mills, MD

Senior ETL Developer

Responsibilities:

Created Uprocs, Sessions, Management Unit to schedule jobs using $U.
Conduct source System Analysis and developed ETL design document to meet business requirements.
Tuned Teradata Sql queries and resolved performance issues due to Data Skew and Spool space issues.
Developed Flat files from Teradata using fast export, Bteq to disseminate to downstream dependent systems.

Environment: Teradata, Oracle, PL/SQL, Informatica Power Center, $U, Business Objects, SSIS, Windows XP, UNIX Shell scripting.

Confidential, Temple, TX

ETL Developer

Responsibilities:

Documenting functional specifications and other aspects used for the development of ETL mappings
Design, Developing and testing of the various Mappings and Mapplets, worklets and
Optimized Performance of existing Informatica workflows.
Involved in fixing invalid Mappings, testing of Stored Procedures and Functions, Unit and Integration Testing of Informatica Sessions, Batches and the Target Data.

Environment: Oracle, SQL Server, DB2, Informatica Power Center, Erwin, Cognos, XML, Windows, Unix

Confidential, Minnesota, MN

ETL Developer

Responsibilities:

Developed various Mappings with the collection of all Sources, Targets, and Transformations using Informatica Designer
Extracted data from various sources across the organization (Oracle, SQL Server and Flat files) and loading into staging area
Created and scheduled Sessions and Batch Process based on demand, run on time, or run only once using Informatica Workflow Manager and monitoring the data loads using the Workflow Monitor

Environment: Oracle, SQL Server, PL/SQL, Informatica Power Center, Erwin, Cognos, Windows, UNIX

We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship