We provide IT Staff Augmentation Services!

Data Scientist Resume

Boston, MA

PROFESSIONAL SUMMARY:

  • 13 years of experience in the various stages of software development especially Technical Design, Development and Testing.
  • 3 Years of experience as a data scientist professional in Data Analytics, Statistical Modeling, Visualization and Machine Learning. Excellent capability in collaboration, quick learning and adaptation.
  • 10 years of experience in implementation, customization and development of SYBASE T - SQL, Oracle 9i on Windows and UNIX environments
  • Experience in Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Experience in integrating data, profiling, validating and data cleansing transformation and data visualization using R and Python.
  • Theoretical foundations and practical hands-on projects related to
  • Supervised learning (linear and logistic regression, boosted decision trees, Support Vector Machines, neural networks, NLP)
  • Unsupervised learning (clustering, dimensionality reduction, recommender systems)
  • Probability & statistics, experiment analysis, confidence intervals, A/B testing
  • Algorithms and data structures.
  • Experience in migration from heterogeneous sources from Sybase to Teradata.
  • Intensive hands-on Boot camp on Data Analytics course spanning from Statistics to Programming including data engineering, data visualization, machine learning and programming in R, SQL.
  • Hands-on experience on Python and libraries like Numpy, Pandas, Matplotlib, NLTK, Sci-Kit learn, SciPy.
  • Experience in Descriptive Analysis Problems like Frequent Pattern Mining, Clustering, Outlier Detection.
  • Worked on Machine Learning algorithms like Classification and Regression with KNN Model, Decision Tree Model, Naive Bayes Model, Logistic Regression, SVM Model and Latent Factor Model.
  • Good Exposure in deep learning with Tensor flow in python.
  • Good Knowledge on Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R.
  • Good knowledge in Tableau, Power BI for interactive data visualizations.
  • In-depth Understanding in NOSQL databases like MongoDB, HBase.
  • Good exposure in creating pivot tables and charts in Excel.
  • Experience in developing Custom Report and different types of Tabular Reports, Matrix Reports, Ad hoc reports.
  • Good analytical and problem-solving skills
  • Experience in SYBASE (T-SQL), Teradata, UNIX, Actuate7
  • Proficient in technical design documentation using AGILE-Scrum Methodology.
  • Good exposure to SVN, Documentum, True Change, GITHUB and Version Control tools.
  • Effective communication skills and record for coordination between globally located teams.
  • Exposure to all phases in Software Development life cycle.
  • Project delivery under stringent timelines.

TECHNICAL EXPERTISE:

Database: MongoDB, Hive, Databases T-SQL, Oracle SQL, SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra.

Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

Packages: ggplot2, twitter, NLP, Reshape2, pandas, numPy, sciPy, matplot lib, sci-kit-learn.

Languages: Python, C, C++, Java, T-SQL, PL/SQL

Misc. Tools Schedulers: Crontab, Autosys

Reporting tools: Actuate BIRT

ETL Tool: Informatica, BI/DW

Version Control: SVM, CVS, GitHub

BI Tools: Tableau, Tableau Server, Tableau Reader, Actuate

Others: Putty, Secure Shell, Rapid SQL, SQL Tools, Toad, Six Sigma Trained.

PROFESSIONAL EXPERIENCE:

Confidential, Boston, MA

Data Scientist

Environment: Python 3.x, R, HDFS, Tableau

Responsibilities:

  • Perform Data Profiling to learn about behavior with various features such as traffic pattern, location, Date and Time etc.
  • Extracted the data from hive tables by writing efficient Hive queries. Collect data needs and requirements by Interacting with the other departments.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks.
  • Develop Spark/Scala, Python, R for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Evaluate models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like ElasticSearch, Kibana etc.
  • Work with NLTK library to NLP data processing and finding the patterns.
  • Categorize comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
  • Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Perform Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Implemented different models like Logistic Regression, Random Forest and Gradient-Boost Trees to predict whether a given die will pass or fail the test.
  • Use MLlib, Spark's Machine learning library to build and evaluate different models.
  • Communicate the results with operations team for taking best decisions.

Confidential, Hoboken, NJ

Data Scientist

Environment: Python, Tableau, Hadoop, Hive, MS SQL Server, MS Access, MS Excel Outlook, Power BI

Responsibilities:

  • Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and understanding the functional work flow of information from source systems to destination systems.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, Unix Commands, NoSQL, Hadoop.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Used Python and R scripting by implementing machine algorithms to predict the data and forecast the data for better results.
  • Used Python and R scripting to visualize the data and implemented machine learning algorithms.
  • Used predictive analysis to create models of customer behavior that are correlated positively with historical data and use these models to forecast future results.
  • Application of various machine learning algorithms and statistical modeling like Decision Trees, Random Forest, Regression Models, neural networks, SVM, clustering to identify Volume using scikit-learn package.
  • Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values.
  • Developed a predictive model and validate Neural Network Classification model for predict the feature label.
  • Hands on experience in using HIVE, Hadoop, HDFS and Bigdata related topics.
  • Accomplished data cleansing and analysis results using Excel pivot tables, VLOOKUPs, data validation, graphs and chart manipulation in Excel.
  • Designed complex SQL queries, Views, Stored Procedures, Functions and Triggers to handle database manipulation and performance.
  • Used SQL, PLSQL scripts for automating repeatable tasks of customer feedback survey data collection and distribution which increased the departmental efficiency by 8%.

Confidential

Data Analyst

Environment: SYSBASE, Oracle SQL, PLSQL, UNIX, VB, .Net

Responsibilities:

  • Understand the Research DB which serves numerous applications.
  • Understand functional requirement and design the approach after analyzing the existing BAU logic.
  • Involved in coding of stored procedures in Sybase.
  • Involved in coding of stored procedures and packages in Oracle.
  • Unit Testing of the BL.
  • Responsible for the research databases and a suite of front end tools for updating and retrieving research data, company disclosure information and reporting functions

Confidential

Sr. Programmer

Environment: SYSBASE, ORACLE SQL, PLSQL, UNIX, Informatica

Responsibilities:

  • Re-engineering the BAU business logic to cater to the new enhancements.
  • Understand the functional requirement and design the approach after analyzing the gaps from BAU logic.
  • Involved in coding of stored procedures in Sybase.
  • Involved in coding of stored procedures and packages in Oracle.
  • Unit Testing of the BL.
  • Participated in the UI testing of CN i-Portal screens.

Confidential

Sr. Programmer

Environment: SYSBASE, UNIX, Informatica

Responsibilities:

  • Dual Development and Dual maintenance of code in Sybase and Teradata till the whole DB is migrated.
  • Involved in coding of stored procedures in Sybase and Teradata.
  • Gathering the Data feed requirements from the Data Providers and perform ETL to load the data into IDN warehouse. This is done using the MDF (Metadata Driven Framework- Confidential Specific utility).
  • Coding of stored procedures for running the application scanners. These scanners run on the production data (OLTP) and generate reports to the downstream business with the required data.
  • Scheduling the warehouse and production processes using the Confidential inhouse scheduler Event Engine.
  • As it is a Sybase-Teradata Migration project, mentored team members whenever needed for analyzing the Sybase procedures.
  • Participated actively for the deployment and QA processes till the Post Implementation Verification PIV Sign-off is received.
  • Implemented the plan for the rollover of this project to production.

Confidential

Team Lead

Environment: SYSBASE, UNIX, Informatica

Responsibilities:

  • Successfully taken the transition from Syntel - Confidential ’s and played back the KT given.
  • Brought the system to stable state before expected timelines and taken the new developments.
  • Implemented software releases and patches on the user acceptance and production environments.
  • Performed defect analysis of the real time issues in the production environment.

Confidential

Team Lead

Environment: SYSBASE, UNIX, Informatica

Responsibilities:

  • Analyzed the existing architecture
  • Supported the team-members in the Sybase stored procedures during the migration of the system from PowerBuilder to Java.

Confidential

Developer

Environment: SYSBASE, UNIX

Responsibilities:

  • Understood the product Confidential, one of the leading back office settlement systems.
  • Analyzed, designed and developed reports like All Open Trades, Settlement Due Report and various extracts related to the trades and settlements.
  • Coding for Sybase stored procedures, C Functions
  • I involved in various bug fixes to the clients which use Confidential as their settlement system.

Confidential

Jr. Software Engineer

Environment: SYSBASE, UNIX, Informatica, Actuate

Responsibilities:

  • Involved in report designing using Actuate-7 Erd- Pro, BIRT tool
  • Developed Actuate reports with backend Sybase coding of stored procedures.
  • Created jobs to be scheduled in Unix.
  • Unit Testing, code deployment and support.

Hire Now