We provide IT Staff Augmentation Services!

Data Scientist Resume

0/5 (Submit Your Rating)

Pittsburgh, PA

SUMMARY

  • 12+ Years of Experience in Data Warehouse, Business Intelligence and Data Science - Big Data Analytics technologies
  • 4+ Years of experience in Big Data Analytics with hands on experience in Data Extraction, Data Analysis, Data Loading, Data Visualization using Cloudera Platform (Sqoop, Flume, Pig, Hive, Hbase, Spark ), R and other platforms
  • Domain experience in Banking, Insurance, Retail, Telecom, Revenue Authority
  • Experience in data science including collecting data, clean data, exploratory data analysis, used machine learning algorithms for developing predictive models and created visualizations for making decisions
  • Hands on Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa, and analyzing data using Hive, Impala & Pig Latin Scripting
  • Experience in acquiring structured and un-structured data from a variety of sources, including relational databases, Web Scraping and loading to distributed databases such as HDFS on a Hadoop Platform.
  • Proficiency in Spark for loading data from the local file system, HDFS, Amazon S3, Relational and NoSQL databases and using Spark SQL, Import data into RDD and Ingesting data from a range of sources using Spark Streaming
  • Proficient in R Programming Language, Data extraction, Data cleaning, Data Loading, Data Transformation, Predictive Modeling and Data visualization using R and Tableau
  • Hands on experience on R and using Machine Learning Algorithms K means clustering, Random forests, Decision tree, Time-series, Regression, Clustering & Association Rules
  • 11+ Years of experience in DWH and Business Intelligence implementations using Oracle ETL, BI Tools
  • Designed Enterprise Data Warehouse, Dimensional Models and BI Reporting Solutions
  • 2+ Years of experience in ERP Applications that include SAP FICO & Oracle Financials Modules
  • 11+ Years of experience in retail banking operations

TECHNICAL SKILLS

Big Data -Hadoop Ecosystem: Cloudera Platform - Sqoop, Flume, Pig, Hive, Hbase and Spark

RDBMS: Oracle 8.x/9.x/10.x, 11.x, MySQL 5.x, NoSQL- Mongo DB

ETL & BI Tools: OWB, Oracle BI Tools, Tableau 7.x/8.x

Data Modeling Tools: Oracle Data Modeler, Erwin

Programming Language: R, Python

OS: Windows 7/8/8.1, Linux

Other Tools: MS Project, MS Office Suite, AWS - Cloud Computing

PROFESSIONAL EXPERIENCE

Confidential, Pittsburgh, PA

Data Scientist

Responsibilities:

  • Worked with project team to understand the problem and business requirements
  • Worked with developers to extract data from HDFS to Spark shell for analysis
  • Imported data into R for exploring and understanding data
  • Exploring the data and data structures for developing model
  • Prepared data for creating training and test sets
  • Developed credit risk model to identify risky bank loans using decision tree algorithm
  • Communicated results using presentations and visualization

Environment: Linux, Hadoop, Hive, MySQL, Spark, R, R-Studio, Tableau

Confidential, New York

Data Scientist

Responsibilities:

  • Involved in extracting data from source to HDFS
  • Importing data from HDFS to Hive using Sqoop
  • Preparing data for exploratory analysis using data munging
  • Segmenting data by implementing k-means algorithm
  • Developed, Evaluated and improving the model performance
  • Deployed the model in production environment
  • Created visualization using R

Environment: Linux, Hadoop, MySQL, R, R-Studio

Confidential, Sterling, VA

Data Scientist

Responsibilities:

  • Gather requirements for various data mining projects
  • Worked with other team members and involved in development of the Hive/Impala scripts for extraction, transformation and loading of data
  • Involved in loading data from Hive and imported to R for data analysis and visualization
  • Responsible for preparing data and exploratory analysis for machine learning to develop models
  • Created standard data summaries, extracted subset of data and split data and created data partitions
  • Created various types of data visualizations using R and Tableau

Environment: CDH4, HDFS, Pig, Hive, Impala, Sqoop, LINUX, R, Tableau Desktop, Tableau Server

Confidential, Columbus, OH

Data Scientist

Responsibilities:

  • Involved in loading data from HDFS to Hive using Sqoop for Hive queries using Hive
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins
  • Created HBase tables to store various data formats of data coming from different portfolios
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location inHadoopDistributed File System (HDFS)
  • Involved in importing data from Hive to R for data exploration and data cleaning for developing predictive models as per requirements
  • Developed predictive models in marketing for customer segmentation using R algorithms

Environment: Hadoop, Java, UNIX, HDFS, Pig, Hive, MapReduce, Sqoop, Hbase, LINUX, Flume, R

We'd love your feedback!