Data Scientist Resume
PROFESSIONAL SUMMARY:
- 6.5 years of experience in Machine Learning, Predictive Analytics, Data Mining and ETL Development
- Had architected solutions for problems in Machine Learning, Advanced Predictive Analytics, Business Optimization and Text Mining
- Strong understanding of Big Data concepts: HDFS, Map - Reduce and Hadoop Eco- system
- Good knowledge of SQL/PLSQL
- Knowledge in Object Oriented Programming concepts
- Proven ability as a quick learner of new skills and technologies
- Effective team player with excellent communication skills and an insight to determine priorities, schedule work and meet critical deadlines
- Ability to rapidly troubleshoot and resolve complex technical issues
- Strong analytical, problem solving, programming and debugging skills
TECHNICAL SKILLS
Languages: R, Python, Core Java, C, PL/SQL
Platform/Technologies: HDFS, Map-Reduce, Hive, HBase, Mahout, nltk
Tools: Talend 4.1.2, Quick Rec, Tortoise SVN, Putty, Cloudera Manager
Essential Skills: Text Mining, Data Mining, Data Modeling, Predictive Analytics, Statistical Modeling, Big Data, Advanced Analytics Algorithms
PROFESSIONAL EXPERIENCE
Confidential
Data Scientist
Responsibilities:
- Collection of Data, Dividing the data into Training, Testing and Evaluation
- Designing, Architecting and Choice of Algorithms
- Model Building using Random Forest and Gradient Boosting methods like the XGBoost
- Feature Selection from the fitted model
- Evaluation of the Model built
- Visualization of the Output
- Analyzing the False Negatives, False Positives
Environment: R
Analytics Engineer
Confidential
Responsibilities
- Data collection: Scraping the Client Web site for posts
- Analyzing the data, performing Text Pre-processing Steps
- Model Building
- Cross check the end results
- Share the results with Customer
- Visualize the output
Environment - Revo R, DeployR
Confidential
Analytics Engineer
Responsibilities:
- Build the flow of the Project
- Coordinate with the Business Analyst and get the requirements understood
- Process data in a way the model could read
- Build the Prediction Model
- Score the Model and identify Risky Customers
- Visualize the Output
Environment - R, Java, Revo R, DeployR
Confidential
Analytics Engineer
Responsibilities:
- Deterministic Chain Ladder
- Chain Ladder as Weighted Mean
- Chain Ladder using weighted linear regression
- Poisson Regression
- Quasi Poisson Regression
- Bootstrapped Chain Ladder
- Mack Chain Ladder
- Log Linear Model
- Clark LDF Method
- Build the flow of the Project
- Coordinate with the Business Analyst and get the requirements understood
- Process data in a way the model could read
- Build the Prediction Model using different Chain Ladder Techniques
- Build the Prediction Model using Time Series Method
- Compare and Visualize the Output
Text Classification
Confidential
Responsibilities:
- First being a probability based approach, The NaïveBayes Classifier, which uses Bayes Theorem to predict the probability that a given feature set belongs to a particular label.
- Second being a linguistic based knowledge incorporation model for identification of context and then incorporating Nearest-Neighbor Classifier for text classification. The model represents text in terms of synsets in the WordNet- a lexical knowledge base of English words along with the semantic relations. WordNet similarity is measured between the word(from the input, the tweet) and words from a manually prepared medical dictionary. Having captured the relatedness of a tweet to a medical term in the med dictionary, the data is set, to run the K- Nearest-Neighbor Classifier.:
- Collection of Data, Dividing the data into Training, Testing and Evaluation
- Manual labeling of the class for Training and Testing
- Designing, Architecting and Choice of Algorithms
- Model Building in Python
- Evaluation of the Model built
- Visualization of the Output in Python
- Analyzing the False Negatives, False Positives
Confidential
ETL Developer
Roles & Responsibilities:
- Offshore lead role for E2E AML RECON
- Involved in requirement gathering and analysis of the raw data
- Involved in Development of the ETL job in Talend for application of rules on raw transaction files for various Product Processors
- Involved in Unit testing of the ETL jobs created
- Involved in preparing Unit Test Case documents for each of the ETL jobs created
- Involved in running the QuickRec tool to verify the differences in the Expected and the Actual results
- Involved in running the QuickRec tool to verify the differences in the Expected and the Actual results
- Involved in Break Analysis of the output to determine whether the correct rules have been applied
- Interact with the clients on Break analysis
Technology - Core Java, SQL, UNIX, Talend 4.1.2, Quick Rec, Tortoise SVN, Putty