Data Scientist Resume
Pittsburgh, PA
SUMMARY
- 12+ Years of Experience in Data Warehouse, Business Intelligence and Data Science - Big Data Analytics technologies
- 4+ Years of experience in Big Data Analytics with hands on experience in Data Extraction, Data Analysis, Data Loading, Data Visualization using Cloudera Platform (Sqoop, Flume, Pig, Hive, Hbase, Spark ), R and other platforms
- Domain experience in Banking, Insurance, Retail, Telecom, Revenue Authority
- Experience in data science including collecting data, clean data, exploratory data analysis, used machine learning algorithms for developing predictive models and created visualizations for making decisions
- Hands on Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa, and analyzing data using Hive, Impala & Pig Latin Scripting
- Experience in acquiring structured and un-structured data from a variety of sources, including relational databases, Web Scraping and loading to distributed databases such as HDFS on a Hadoop Platform.
- Proficiency in Spark for loading data from the local file system, HDFS, Amazon S3, Relational and NoSQL databases and using Spark SQL, Import data into RDD and Ingesting data from a range of sources using Spark Streaming
- Proficient in R Programming Language, Data extraction, Data cleaning, Data Loading, Data Transformation, Predictive Modeling and Data visualization using R and Tableau
- Hands on experience on R and using Machine Learning Algorithms K means clustering, Random forests, Decision tree, Time-series, Regression, Clustering & Association Rules
- 11+ Years of experience in DWH and Business Intelligence implementations using Oracle ETL, BI Tools
- Designed Enterprise Data Warehouse, Dimensional Models and BI Reporting Solutions
- 2+ Years of experience in ERP Applications that include SAP FICO & Oracle Financials Modules
- 11+ Years of experience in retail banking operations
TECHNICAL SKILLS
Big Data -Hadoop Ecosystem: Cloudera Platform - Sqoop, Flume, Pig, Hive, Hbase and Spark
RDBMS: Oracle 8.x/9.x/10.x, 11.x, MySQL 5.x, NoSQL- Mongo DB
ETL & BI Tools: OWB, Oracle BI Tools, Tableau 7.x/8.x
Data Modeling Tools: Oracle Data Modeler, Erwin
Programming Language: R, Python
OS: Windows 7/8/8.1, Linux
Other Tools: MS Project, MS Office Suite, AWS - Cloud Computing
PROFESSIONAL EXPERIENCE
Confidential, Pittsburgh, PA
Data Scientist
Responsibilities:
- Worked with project team to understand the problem and business requirements
- Worked with developers to extract data from HDFS to Spark shell for analysis
- Imported data into R for exploring and understanding data
- Exploring the data and data structures for developing model
- Prepared data for creating training and test sets
- Developed credit risk model to identify risky bank loans using decision tree algorithm
- Communicated results using presentations and visualization
Environment: Linux, Hadoop, Hive, MySQL, Spark, R, R-Studio, Tableau
Confidential, New York
Data Scientist
Responsibilities:
- Involved in extracting data from source to HDFS
- Importing data from HDFS to Hive using Sqoop
- Preparing data for exploratory analysis using data munging
- Segmenting data by implementing k-means algorithm
- Developed, Evaluated and improving the model performance
- Deployed the model in production environment
- Created visualization using R
Environment: Linux, Hadoop, MySQL, R, R-Studio
Confidential, Sterling, VA
Data Scientist
Responsibilities:
- Gather requirements for various data mining projects
- Worked with other team members and involved in development of the Hive/Impala scripts for extraction, transformation and loading of data
- Involved in loading data from Hive and imported to R for data analysis and visualization
- Responsible for preparing data and exploratory analysis for machine learning to develop models
- Created standard data summaries, extracted subset of data and split data and created data partitions
- Created various types of data visualizations using R and Tableau
Environment: CDH4, HDFS, Pig, Hive, Impala, Sqoop, LINUX, R, Tableau Desktop, Tableau Server
Confidential, Columbus, OH
Data Scientist
Responsibilities:
- Involved in loading data from HDFS to Hive using Sqoop for Hive queries using Hive
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins
- Created HBase tables to store various data formats of data coming from different portfolios
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location inHadoopDistributed File System (HDFS)
- Involved in importing data from Hive to R for data exploration and data cleaning for developing predictive models as per requirements
- Developed predictive models in marketing for customer segmentation using R algorithms
Environment: Hadoop, Java, UNIX, HDFS, Pig, Hive, MapReduce, Sqoop, Hbase, LINUX, Flume, R
