Data Scientist Resume
Charlotte, NC
PROFESSIONAL SUMMARY:
- Dedicated IT professional with 10+ years of experience in Data Analysis, Data Profiling, Data Integration, Data Visualization, Predictive modeling, Metadata Management, Business requirement analysis, Quality assurance, Design, Development, and Testing to further the success of various organizations business goals and objectives
- Extensive experience in Machine Learning solutions to various business problems and generating data visualizations using Python and R
- Hands on experience with in extracting and modeling datasets from verity of data sources like Hadoop using Pig, Hive, Spark, Teradata and Snowflakes for adhoc analysis and have fair understanding of AGILE methodology and practice
- Advanced knowledge of statistical techniques in Sampling, Probability, Correlation, Multivariate data analysis, PCA, Time - series analysis and application of Statistical Concepts using SAS and SPSS
- Hands on experience in implementing Web Services technologies like SOAP, WSDL, REST API
- Advanced working SQL knowledge and experience working with relational databases and optimizing the queries with a variety of databases such as PostgreSQL, MySQL, Microsoft SQL Server, Teradata, IBM DB2, and Oracle
- Proficient in Data entry, Data auditing, creating data reports & monitoring data for accuracy Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing
- Expertise in programming languages like Python, Scala, R, SAS & Java
- Experience with AWS technologies like Redshift, S3, EC2, Sage Maker& EMR
- Extensive experience in working with Tableau 9.0 and 10.0 Desktop along with Tableau Server Knowledge of Claims Billing (procedure codes, modifiers, diagnoses)
- Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling
- Extensive working knowledge in Microsoft Excel for Data Analysis
- Expertise in Data Reporting, Adhoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting
SKILL:
Machine Learning: Classification, Regression, Clustering, Association, Logistic Regression, Decision Trees, RandomForest, Na ve Bayes, K-Nearest Neighbors (K-NN), Kernel SVM
Big Data: Pig, Hive, Spark, HBase, Kafka, Sqoop, Cassandra, MongoDB
Databases: Oracle, MySQL, MS SQL Server, Postgres
Analytics: Python, R, SAS, SPSS, SSIS, Informatica, Excel
Visualization: Tableau and PowerBI
Operating Systems: Windows, Unix/Linux, Mac
Design Methodologies: Agile, Scrum, Waterfall
IDE s: Jupyter, Spyder
WORK EXPERIENCE:
Confidential, Charlotte, NC
Data Scientist
Responsibilities:
- Working for Data and Analytics (Data Science) team in helping Marketing and Finance team in analyzing various data sources to promote the growth of insurance products.
- Worked with Marketing team in providing insights on the product sellers using MachineLearning Algorithms.
- Worked on tuning the Propensity models to predict the customer behavior and predict the score of the most wanted customers for the business.
- Experience in Dimensionality Reduction, Model selection and Model boosting methods using Principal ComponentAnalysis (PCA), K-Fold Cross Validation and Gradient Tree Boosting.
- Involved in extracting data from various sources and performed data cleansing, data integration, data transformation, data mapping and loading data into hadoop with Apache Sparkusing PySpark and SparkSQL.
- Wrote SparkSQL scripts to Query multiple tables, performed joins and create tables to load data into Hive.
- Wrote Hashing algorithm to secure PII data, and stored the data in the hadoop.
- Performed various ETL jobs for various data transformations and data ware housing using Informatica.
- Worked on SAP Big data services and ingested data from Oracle and MySQL data bases using Sqoop.
- Prepared analysis reports and dashboards for the clients using Tableau.
- Worked with Data Engineering, Data Governance and Data Management teams to deliver the client requirements.
Environment: Python, Spark, Hive, HDFS, Informatica, SSIS, Oracle, MySQL, Tableau and Git (VSTS)
Confidential, Atlanta, GA
Data Scientist
Responsibilities:
- Worked on Data cleaning, Data preparation and Feature engineering with Python, including NumPy, SciPy, Matplotlib, Seaborn, Pandas, and Scikit-learn.
- Worked on various Business cases using Machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means &KNN for data analysis.
- Built object-oriented framework to easily allow construction of multi-layer ensemble Machine-learning models, using Scikit-learn, XGBoost, Theano, and other Python toolkits.
- Hands on experience in Dimensionality Reduction, Model selection and Model boosting methods using Principal Component Analysis (PCA), K-Fold Cross Validation and Gradient Tree Boosting.
- Coordinated the execution of A/B tests to measure the effectiveness of a personalized recommendation system.
- Advanced knowledge of statistical techniques in Sampling, Probability, Multivariatedataanalysis, PCA, and Time-seriesanalysis using SAS.
- Created Hive queries that helped market analysts spot emerging trends by comparing incremental data with Teradata tables and historical metrics.
- Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Wrote several TeradataSQL Queries using Teradata SQL Assistant for AdHocData Pull request.
- Performing statistical data analysis and data visualization using R and Python.
- Maintenance of large data sets, combining data from various sources by Excel, Enterprise, and SAS Grid, Access and SQL queries.
- Performed analysis of implementing Spark uses Scala and wrote spark sample programs using PySpark.
- Worked on creating filters and calculated sets for preparing dashboards and worksheets in Tableau.
- Created data models in Splunk using pivot tables by analyzing vast amount of data and extracting key information to suit various business requirements.
- Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
- Developed TeradataSQL scripts using OLAP functions like rank and rank over to improve the query performance while pulling the data from large tables.
- Design and development of ETL processes using Informatica ETL tools for dimension and fact file creation.
Environment: SQL, Teradata, Informatica, SAS, Hive, Tableau, Python, Tensor Flow, Keras, GIT, Google cloud platform and Tableau.
Confidential
Data Analyst
Responsibilities:
- Involved in evaluating customer credit data and financial statements in order to determine the degree of risk involved in lending money.
- Developed predictive solutions to support commercial banking team using machine learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Random Forest, Support Vector Machine in Python.
- Conducted analysis in assessing customer behaviors with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
- Evaluated parameters with K-Fold Cross Validation, Grid search methods to optimize performance of models
- Worked on data cleaning, data preparation and feature engineering with Python, including NumPy, SciPy, Matplotlib, Seaborn, Pandas, and Scikit-learn.
- Implemented Agile Methodologies, Scrum stories and sprints in a Python based environment, along with data analytics and Excel data extracts.
- Design and build world-class high-volume real-time data ingestion frameworks and automate various data sources into Bigdata technologies like Hadoop etc.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Used Hive and created Hive tables and involved in data loading and writing HiveUDFs.
- Experience designing and optimizing complex SQL queries involving table joins using MySQL.
- Worked in Tableau environment to create weekly, monthly, daily reports using tableau desktop & publish them to server.
- Worked on importing and exporting data from Oracle into HDFS using Sqoop.
- Worked on Excel using VLOOKUP, pivots, conditional formatting, large record sets, data manipulation and cleaning.
- Used GIT HUB as version control software to manage the source code and to keep track of changes to files which is fast and light weight system.
Environment: Python, MySQL, SAS, Pig, HDFS, Hive, Excel, Tableau and GIT.
Confidential
Data Analyst
Responsibilities:
- Worked for Risk management team in identifying the risk involved in the Mortgage process by evaluating the customer and property records.
- Involved in addressing a wide range of challenging problems using techniques from applied statistics, machine learning and data mining fields.
- Involved in providing insights using Machine learning algorithms by tailoring to particular needs and evaluated on large data sets using Caret, e1071, rpart, randomForest, glmnet, gbm, mboost and arules in R.
- Integrated new tools and developed technology frameworks/prototypes to accelerate the data integration process and empower the deployment of predictive analytics by developing Spark Scala modules with R.
- Experience in descriptive statistics and hypothesis testing using Chi-square, T-test, Pearson correlation and Analysis of variance (ANOVA).
- Advanced knowledge of statistical techniques in Sampling, Probability, Multivariate data analysis, PCA, and Time-series analysis using SAS.
- Experience in transferring and managing data to Hadoop clusters using Kafka, Sqoop, Oozie and Zookeeper.
- Involved in fixing invalid mappings, testing of Stored Procedures and Functions, Unit and Integrating testing of Informatica Sessions, Batches, and the Target Data.
- Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Analyzed Relational & Non-relational data using MySQL and HBase.
- Contributed Technology, Projectmanagement, and Business management functions to push the business forward with innovative solutions.
- Performed data acquisition and exploratory data analysis in R.
- Visualized team metrics and communicated to the higher management using PowerBI.
Environment: R, MySQL, Hive, HBase, Spark, PowerBI and GIT.
Confidential
Data Analyst
Responsibilities:
- As part of a project, worked on HTML, CSS and JavaScript to develop a web application for online applications.
- Involved with Data Analysisprimarily identifying data sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
- Involved in development of General Ledger module, which streamlined analysis, reporting and recording of accounting information.
- Managed connectivity using JDBC for querying/inserting and data management including triggers and stored procedures.
- Involved in maintaining student records in the database using Oracle.
- Resolved the data related issues such as: assessing data quality, data consolidation, evaluating existing data sources using MS EXCEL.
- Involved in preparing weekly reports for the management using Tableau.
Environment: HTML, CSS, Oracle, MS Excel, Tableau.