Data Scientist Resume
Bloomfield, CT
SUMMARY:
- Around 8 years of IT experience as a Data Scientist, including profound expertise and experience on statistical data analysis such as transforming business requirements into analytical models, designing algorithms, and strategic solutions that scales across massive volumes of data.
- Proficient in Statistical Methods like Regression models, hypothesis testing, Ab Initio, confidence intervals, principal component analysis and dimensionality reduction.
- As Big data solution architect in defining the future state capabilities using Hadoop ecosystem HBase, Hive, Talend, Sqoop, MapR, Pig, Spark, Kafka, Flume.
- Experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Ab Initio and Informatica Power Center.
- Expert in R and Python scripting. Worked in stats function with Numpy, visualization using Matplotlib/Seaborn and Pandas for organizing data
- 5years of experience in orientedprogramminghigh level language.
- 4 years of experience in Scala and spark.
- Worked with various Python packages using pip install - sphinx, report lab, xlwt, xlrd, virtualenv, lxml, etc.
- Experience in using various packages in R and pythonlike ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, SciPy, scikit-learn, Beautiful Soup, Rpy2.
- Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
- Hands on experience in Data Extraction, Data Mining, Analysis and make presentation /recommendations based on analysis results.
- Hands on experience in ETL tools like Talend, Informatica.
- Hands on experience in Masters Data management (MDM) in Talend.
- Worked on TDM can be deployed on premises, in the cloud, and in big data and via hybrid cloud configurations.
- Provide technical insight and assistance to the data modeler in building new design adhering to existing Data warehouse architecture and review the data model and suggest improvement ideas
- Experience in writing code in R and Python to manipulate data for data loads, extracts, statistical analysis, modeling, and data munge.
- Utilized analytical applications like R, SPSS, Rattle and Python to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
- Skilled in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
- Professional working experience in Machine Learning algorithms such as LDA, linear regression, logistic regression, Naive Bayes, Decision Trees, Clustering, and Principle Component Analysis.
- Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, Spark SQL.
- Experience working with data modeling tools like Erwin, Power Designer and ER Studio.
- Experience in designing star schema, Snowflake schema for Data Warehouse, ODS Architecture.
- Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2, MongoDB, HBase and SQL Server databases.
- Experienced in writing complex SQL Quires like Stored Procedures, triggers, joints, and Sub quires.
- Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
- Good knowledge on AWS cloud S3 and GIS in big data technologies.
- Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from various sources, prepared data for data exploration using data munging and Teradata.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Ability to work with managers and executives to understand the business objectives and deliver as per the business needs and a firm believer in team work.
- Experience and domain knowledge in various industries such as healthcare, insurance, retail, banking, media and technology.
- Work closely with customer's, cross-functional teams, research scientists, software developers, and business teams in an Agile/Scrum work environment to drive data model implementations and algorithms into practice.
- Strong written and oral communication skills for giving presentations to non-technical stakeholders.
PROFESSIONAL EXPERIENCE:
Data Scientist
Confidential - Bloomfield, CT
Responsibilities:
- This project was focused on customer segmentation based on machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation
- Develop a Random Forest model to identify the repeat call customers; reduction in the repeat calls by 10%
- Design a multi - layered to find the type of service request with in 2minutes; reduction in the AHT of a call by 16%
- Implemented and Deployed a hybrid recommender system to reduce the handle time of a call.
- Planned strategic workforce scaling by implementing employee predictive model; suggested recommendations to reduce effect of job abandonment by 5%
- Worked on Google Cloud Platform and Implemented business logic using Python
- Worked on Talend open studio for Data Integration.
- Worked on MDM in Talend Master data management, through which an organization builds and manages a single, consistent, accurate view of key enterprise data.
- Worked on Informatica TDM, to achieve shorter development cycles and faster deployment while improving compliance with data privacy regulations.
- Worked on real time multitasking programs using python.
- Designed and implemented components using Python.
- Experience working with Spark SQL and creating RDD's using Py Spark.
- Worked extensively in Optum Big data strategy providing Big data as a service and Data Fabric frame work
- Architect solutions for legacy system to migrate into Big data platform
- Experience working with ETL of large datasets using Pyspark in Spark on HDFS
- Developed entire frontend and backend modules using R
- Created User Interface (UI) using JavaScript, bootstrap and HTML5/CSS.
- Worked on frontend frameworks like CSS Bootstrap for development of Web applications.
- Developed and tested many features for dashboard using Python, Bootstrap, CSS, and JavaScript.
- Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
- Worked on AWS Cloud to Visualize big data on cloud.
- Worked on AWS to process and store big data cloud.
- Worked on GIS to create maps for database system.
- Worked on SAS programming language for statistical analysis.
- Developed Spark streaming applications for consuming the data from Kafka topics.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Analyzed the data using Spark Data Frames and series of Hive Scripts to produce summarized results from Hadoop to downstream systems.
- Modified the Ab Initio graphs to utilize data parallelism and thereby improve the overall performance to fine-tune the execution times.
- Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions prototyping and experimenting MachineLearning/DeepLearning algorithms and integrating into production system for different business needs.
- Experience on Data Masking purpose is to protect the actual data while having a functional substitute for occasions when the real data is not required.
- Data Profiling used to perform analysis of data patterns within a SQL Server
- Worked with Hadoop a reliable shared storage (HDFS) and analysis system (MapReduce).
- Performed Data grievance to track the database.
- Implemented business logic using Python/Django.
- Worked on real time multitasking programs using python.
- Designed and implemented components using Python.
- Worked on Multiple datasets containing 2billion values which are structured and unstructured data about web applications usage and online customer surveys
- Good hands on experience on Amazon Redshift platform
- Design, built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs
- Segmented the customers based on demographics using K-means Clustering
- Explored different regression and ensemble models in machine learning to perform forecasting
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
- Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI
Environment: MS SQL Server, R/R studio, AWS, Python, Redshift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS office 2007, Outlook.
Data Scientist
Confidential - Columbus, OH
Responsibilities:
- Provide expertise and recommendations for physical databasedesign, architecture, testing, performance tuning and implementation.
- Designedlogical and physical data models for multiple OLTP and Analytic applications.
- Extensively used the Erwin design tool &Erwin model manager to create and maintain the DataMart.
- Designed the physical model for implementing the model into oracle9i physical data base.
- Involved with DataAnalysis primarily Identifying Data Sets, SourceData, Source Meta Data, Data Definitions and Data Formats
- Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server.
- Wrote simple and advanced SQLqueries and scripts to create standard and adhoc reports for senior managers.
- Collaborated the data mapping document from source to target and the data quality assessments for the source data
- Used Expert level understanding of different databases in combinations for Data extraction and loading, joining data extracted from different databases and loading to a specific database.
- Co - ordinate with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data.
- Worked very close with Data Architects and DBA team to implement data model changes in database in all environments.
- Worked on Amazon Athena Easily analyze petabytes of data in Amazon S3 using ANSI SQL.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Performed performance improvement of the existing Data warehouse applications to increase efficiency of the existing system.
- Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
Environment: Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL, etc.
Data Analyst/R Developer
Confidential - Hartford, CT
Responsibilities:
- Involved in Analysis, Design and Implementation/translation of Business User requirements.
- Worked on large sets of Structured and Unstructured data.
- Actively involved in designing and developing data ingestion, aggregation, and integration in Hadoop environment.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Experience in Designing and developing Machine Learning models.
- Experience in creating Hive Tables, Partitioning and Bucketing.
- Performed data analysis and data profiling using complex SQL queries on various sources systems including Oracle 10g/11g and SQL Server 2012.
- Experienced on Test data management TDM tools like Informatica.
- Experience on Test data management helps organizations create better quality software that will perform reliably on deployment.
- Identified inconsistencies in data collected from different source.
- Worked with business owners/stakeholders to assess Risk impact, provided solution to business owners.
- Worked intensively on understanding the data pattern and structure of the data, there by identifying the strength and weakness of the data.
- Created data quality projects, folders and data objects for the Cleansing, scrubbing, standardizing the data by defining the rules for the same.
- Experienced in determine trends and significant data relationships Analyzing using advanced Statistical Methods.
- Carrying out specified data processing and statistical techniques Using SAS such as sampling techniques, estimation, hypothesis testing, time series, correlation and regression analysis Using R.
- Applied various data mining techniques: Linear Regression & Logistic Regression, classification, clustering.
- Took personal responsibility for meeting deadlines and delivering high quality work.
- Strived to continually improve existing methodologies, processes, and deliverable templates.
Environment: R, SQL server, Oracle, HDFS, HBase, MapReduce, Hive, Impala, Pig, Sqoop, NoSQL, Tableau, Unix/Linux, Core Java, Log 4j.
Big Data Analyst
Confidential - San Antonio, TX
Responsibilities:
- Responsible for analyzing large data sets to develop multiple custom models and algorithms to drive innovative business solutions.
- Perform preliminary data analysis and handle anomalies such as missing, duplicates, outliers, and imputed irrelevant data.
- Provide architectural solutions and services for data intake, integration and enrichment covering data intake, integration, Meta data, data quality and data security
- Collaborate with key business and technical stake holders in defining the architecture, strategic roadmaps and business technical solutions and align with Enterprise architecture goals.
- Worked extensively in Optum Big data strategy providing Big data as a service and Data Fabric frame work
- Creating Machine Learning algorithms for trainees or learners.
- Creation Java code by using Talend that can be run anywhere.
- Worked on Data Subset in TDM. Data subset is the process of slicing a part of the Production Database and loading it into the Test Database.
- Worked on TDM prevents bug fixes and rollbacks and overall creates a more cost - efficient software deployment process. It also lowers the organization's compliance and security risks.
- Worked on Talend That can build custom components in Java and integrate them into the studio without any hassle.
- Worked on Data Masking software testing and user training.
- Hands on experience on AWS cloud to process the big data.
- Remove outliers using Proximity Distance and Density based techniques.
- Involved in Analysis, Design and Implementation/translation of Business User requirements.
- Experienced in using supervised, unsupervised and regression techniques in building models.
- Performed Market Basket Analysis to identify the groups of assets moving together and recommended the client their risks
- Experience in determine trends and significant data relationships using advanced Statistical Methods.
- Experience in Hadoop is highly scalable and unlike the relational databases, Hadoop scales linearly. Due to linear scale, a Hadoop Cluster can contain tens, hundreds, or even thousands of servers.
- Implemented Spark batch applications using Scala for performing various kinds of cleansing, de normalization and aggregations.
- Implemented techniques like forward selection, backward elimination and step wise approach for selection of most significant independent variables.
- Performed Feature selection and Feature extraction dimensionality reduction methods to figure out significant variables.
- Performed Data Profiling, a control flow component in SQL Server Integration Services (SSIS).
- Used RMSE score, Confusion matrix, ROC, Cross validation and A/B testing to evaluate model performance in both simulated environment and real world.
- Performed Exploratory Data Analysis using R. Also involved in generating various graphs and charts for analyzing the data using Python Libraries.
- Involved in the execution of multiple business plans and projects Ensures business needs are being met Interpret data to identify trends to go across future data sets.
- Developed interactive dashboards, Created various Ad Hoc reports for users in Tableau by connecting various data sources.
Environment: Python, SQL server, Hadoop, HDFS, HBase, MapReduce, Hive, Impala, Pig, Sqoop, Mahout, Spark MLLib, MongoDB, Tableau, ETL, Unix/Linux.
SQL Developer
Confidential
Responsibilities:
- Extensively experienced working on different dataflow and control flow task, for loop container, sequence container, script task, execute SQL task and Package configuration.
- Created new procedures to handle complex logic for business and modified already existing stored procedures, functions, views and tables for new enhancements of the project and to resolve the existing defects.
- Loading data from various sources like OLEDB, flat files to SQL Server 2012 database Using SSIS Packages and created data mappings to load the data from source to destination.
- Created batch jobs and configuration files to create automated process using SSIS.
- Created SSIS packages to pull data from SQL Server and exported to Excel Spreadsheets and vice versa.
- Built SSIS packages, to fetch file from remote location like FTP and SFTP, decrypt it, transform it, mart it to data warehouse and provide proper error handling and alerting
- Extensive use of Expressions, Variables, Row Count in SSIS packages
- Data validation and cleansing of staged input records was performed before loading into Data Warehouse
- Automated the process of extracting the various files like flat/excel files from various sources like FTP and SFTP (Secure FTP).
- Deploying and scheduling reports using SSRS to generate daily, weekly, monthly and quarterly reports.
Environment: MS SQL Server 2005 & 2008, SQL Server Business Intelligence Development Studio, SSIS-2008, SSRS-2008, Report Builder, Office, Excel, Flat Files, .NET, T-SQL
TECHNICAL SKILLS:
Databases Oracle: (10g/11g), MySQL, SQLite, NO SQL, Vertica, RDBMS, SQL Server 2014, HBase 1.2, MongoDB 3.2. Teradata, Netezza.
Database Tools: PL/SQL Developer, Toad, SQL Loader, Erwin.
Web Programming: Html, CSS, Xml, JavaScript.
Programming Languages: R, Python, SQL, Apache Spark, Scala, UNIX, C, C#, Asp.Net, JAVA, Tableau
DWH BI Tools: Data Stage 9.1, 11.3, Tableau Desktop, D3.js, Excel
Machine Learning: Regression, clustering, SVM, Decision trees, Classification, Recommendation systems, Association Rules
Data Visualization: Qlickview, Tableau9.4/9.2, ggplot2 (R), D3
Bigdata Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase, Amazon EC2, S3 and RedShift), Spark, Storm, Impala, Talend 6, DMX-h.
Technologies/Tools: Azure Machine Learning, SPSS, Rattle, Caffe, Tensor flow, Theano, Torch, Keras, NumPy.
Scripting Languages: JQuery, Angular JS
Scheduling Tools: Autosys, Control-M.