Data Preprocessing And Data Analysis Resume
4.00/5 (Submit Your Rating)
TECHNICAL SKILLS
- Programming Languages: C/C++, Scala, R, Python, Java, CSS/CSS3, HTML, Jquery, Visual basic, vbscript, JavaScript, XML
- Database: MySQL, Microsoft SQL server, Ms - Access, Oracle 10g
- IDE: Eclipse, Jupyuter, Rstudio
- Big Data: Spark, Hadoop, MapReduce, Hive, Data Mining, Hortonworks, Machine learning, Predictive analysis, Data analysis, Data crawling
- Tools: Weka, UFT (Unified Functional testing), ALM (Application Lifecycle Management)
- OS: Windows, Ubuntu, CentOS
PROFESSIONAL EXPERIENCE
Confidential
Data preprocessing and Data analysis
Responsibilities:
- ETL (Extract, Transform, Load) Data from Web applications, Terminal emulator (Bluezone) Mainframe, Services Like SOAP, URLs using scripting, use of excel for data manipulation
- Getting requirements from clients, understanding and analyzing the business for building ETL scripts
- Increased the efficiency of the scripts for Data analysis and reduced the manual work by 70 % to extract
- Used UFT (Unified functional testing) vbscript and ALM (Application Lifecycle Management) for Automation and Functional testing
- Strong verbal and written communication skills as a single resource for handling the module
Confidential
Software Developer
Responsibilities:
- Worked on Development of Web pages in Java, JavaScript, jQuery, CSS, HTML
- Created REST web services through java for webpages and IOS application using Microsoft SQL server management studio 2008 for database connectivity
- Lead a team to complete CUSTARD (Cloud User Security System for Transactions through Accounting and Authentication of Resource Delivery) using Java IDE (Eclipse), Apache Tomcat, SQL.
- 1 + year experience of academic research using machine learning, data mining and statistical methods
- Feature Engineered the most recent and previous offenses and computed the time to recidivate using statistics
- Applied Logistic Regression and Random forest to achieve an accuracy of 68% for prediction of recidivism of inmates within 3 years with a false positive rate of 15% and true negative rate of 20%
- Statistical data analysis and Association rule mining to determine the pattern of recidivism using Hadoop, Hive, Scala, Python, R, pandas, matplotlib, numpy, R, PySpark, Spark, Spark MLlib, Hortonworks Data platform using Machine Learning.
