- Around to Seven years of expert involvement in IT which incorporates around 3+ years of far reaching knowledge in Data Mining, Machine Learning and Spark Development with big informational datasets of Structured and Unstructured information, Data Acquisition, Data Validation, Predictive demonstrating, Data Visualization. Capable in measurable programming languages like R and Python.
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and data visualization.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Datamining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
- Skilled in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
- Experience in using various packages in R and libraries in Python.
- Working knowledge in Hadoop, Hive and NOSQL databases like Cassandra and HBase.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Good industry knowledge, analytical and problem solving skills and ability to work well within a team as well as an individual.
- Highly creative, innovative, committed, intellectually curious, business savvy with good communication and interpersonal skills.
- Ability of quickly adapting to new work pace and learning.
Data Scientist/Machine Learning Engineer
Confidential, Sanjose CA
- Analyzed Trading mechanism for real-time transactions and build collateral management tools.
- Compiled data from various sources to perform complex analysis for actionable results.
- Utilized machine learning algorithms such as linear regression, multivariate regression, naive bayes, Random Forests, K-means, & KNN for data analysis.
- Developed MapReduce/Spark R modules for machine learning & predictive analytics in Hadoop on AWS and Azure
- Pipelined (ingest/clean/munge/transform) data for feature extraction toward downstream classification.
- Measured Efficiency of Hadoop/Hive environment ensuring SLA is met
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
- Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance
- Analyzing the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis
- Developed Scripts and Batch Job to schedule various Hadoop Program
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
- Wrote Hive queries for data analysis to meet the business requirements
- Developed Kafka producer and consumers for message handling
- Responsible for analyzing multi-platform applications using python
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Used storm for an automatic mechanism to analyze large amounts of non-unique data points with low latency and high throughput
- Migrating servers, databases, and applications from on premise to AWS, Azure
- Developed MapReduce jobs in Python for data cleaning and data processing
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: Machine learning, AWS, MS Azure, Cassandra, Scala, Spark, HDFS, Hive, Pig, Linux, Python, MySQL, Eclipse, PL/SQL, SQL connector
Data Scientist/Machine Learning Engineer
- Performed Data Profiling to learn about user behaviour and merged data from multiple data sources.
- Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
- Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python and R.
- Developed Clustering algorithms and Support Vector Machines that improved Customer segmentation and Market Expansion.
- Professional Tableau user (Desktop, Online, and Server)
- Data Story teller, Mining Data from different Data Source such as SQL Server, Oracle, Cube Database, Web Analytics, Business Object and hadoop
- Providing AD hoc analysis and reports to Executive level management team
- Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View
- Developed analytics and strategy to integrate B2B analytics in outbound calling operations
- Implemented analytics delivery on cloud-based visualization such as Business Object and Google analytics platform
- SPOC Data Scientist and predictive analyst to create annual and quarterly Business forecast reports.
- Main source of Business Regression report
- Creating various B2B Predictive and descriptive analytics using R and Tableau
- Creating and automating ad hoc reports
- Responsible for planning & scheduling new product releases and promotional offers
- Worked on NOSQL databases like Cassandra.
- Experienced in Agile methodologies and SCRUM process.
- Parsing data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format.
- Extracted data from HDFS and prepared data for exploratory analysis using data munging.
- Worked on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in R and Python.
Environment: R, Python, UNIX Scripting, Cassandra, Java, Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Oracle, Eclipse
- Worked on the project from gathering requirements to developing the entire application.
- Worked on AnacondaPythonEnvironment.Created, activated and programmed in Anaconda environment. Wrote programs for performance calculations using NumPy and SQLAlchemy.
- Wrotepythonroutines to log into the websites and fetch data for selected options.
- Usedpythonmodules of urllib, urllib2, Requests for web crawling.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data mining solutions to various business problems and generating data visualizations using R, Python and Tableau. Used with other packages such as Beautiful Soup for data parsing.
- Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format. Used with other packages such as Beautiful Soup for data parsing
- Worked on development of SQL and stored procedures on MYSQL.
- Analyzed the code completely and have reduced the code redundancy to the optimal level.
- Design and build a text classification application using different text classification models.
- Used Jira for defect tracking and project management.
- Worked on writing and as well as read data from CSV and excel file formats.
- Involved in Sprint planning sessions and participated in the daily Agile SCRUM meetings.
- Conducted every day scrum as part of the SCRUM Master role.
- Developed the project in Linux environment.
- Worked on resulting reports of the application.
- Performed QA testing on the application.
- Held meetings with client and worked for the entire project with limited help from the client.
Environment: Python, Anaconda, Sypder(IDE), Windows 7, Teradata, Requests, urllib, urllib2, Beautiful Soup, Tableau,pythonlibraries such as NumPy, SQLAlchemy, MySQLdb.