We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Radnor, PA

OBJECTIVE:

Intuitive and result - oriented data scientist with exceptional integration of machine learning algorithms on statistical data. Professional qualified Data Scientist/Data Analyst with around 8 years of experience in Data Science and Analytics including Deep Learning/Machine Learning, Data Mining and Statistical Analysis.

SUMMARY:

  • Involved in data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema.
  • Experience working in R, Python, Jupyter, Pandas, numPy, Scikit, Matplotlib, pyhive, Keras, Hive, noSQL- HBASE, Sqoop, Pig, MapReduce, Oozie, Spark MLlib.
  • Hands on experience in Liner, Logistic Regression, K Means Cluster Analysis, Decision Tree, KNN, SVM, Random Forest, Text Mining/Text Analytics, Time Series Forecasting.
  • Experience in providing data mining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
  • Experienced with BI tools such as OLAP, data warehousing, reporting & querying tools, data mining & Spreadsheets.
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008, NoSQL databases like MongoDB 3.2
  • Hands on experience in Big Data technologies like Spark 1.6, Sparksql, pySpark, Hadoop 2.X, HDFS, Hive 1.X
  • Experienced the full software life cycle in SDLC, Agile, DevOps and Scrum methodologies including creating requirements, test plans.
  • Worked withSAS Enterprise suite, R, Python, and BigData related technologies including Hadoop, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Map-Reduce and Cloudera Manager for design of BI applications.
  • Worked with complex applications such as R, SAS, Matlab and SPSS to develop neural network, cluster analysis.
  • Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, Fastload, Multiload, FastExport.
  • Experience in Data visualization tools including producing tables, graphs, listings using Tableau.
  • Experience with Big Data tools with Hadoop, HDFS, MapReduce, and Spark.
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, and SSRS.
  • Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.

TECHNICAL SKILLS:

DataModeling Tools: ERwin r 7.1/7.2/7.3/8.0 , Oracle Designer 12.0 and ER Studio 8.5.3.

Programming Languages: C/C++, Java, Python, Scala, UNIX shell, Bash, HTML5, SQL, SPSS

Machine Learning Algorithms: Hypothesis Testing, Linear Regression, Logistic Regression, Clustering Techniques, Neural Networks

Statistical Tools: R, Excel, SAS

Visualization Tools: R, Excel, SAS

Big Data Technologies: Hadoop, Map Reduce, Hive, Business Objects, Micro Strategy, Spark, HDFS

Operating Systems: Microsoft Windows 9x/NT/2000/XP/Vista/7 and UNIX

Databases: Oracle, MS SQL Server, MS Access, NoSQL, Teradata, MongoDB, Netezza

BI tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, Qlikview, SAP Business Intelligence, Amazon Redshirt, or Azure Data Warehouse.

WORK EXPERIENCE:

Data Scientist

Confidential, Radnor, PA

Responsibilities:

  • Built data pipelines for reporting, alerting, and data mining. Experienced with table design and data management using HDFS, Hive, Impala, Sqoop, MySQL, Mem SQL, Grafana/Influx DB, and Kafka.
  • Worked with statistical models for data analysis, predictive modelling, machine learning approaches and recommendation and optimization algorithms.
  • Worked in business/data analysis, data profiling, data migration, data integration & metadata management services.
  • Worked extensively on Databases preferably Oracle 11g/12c and writing PL/SQL scripts for multiple purposes.
  • Built models using Statistical techniques like Bayesian HMM and MachineLearning classification models like XGBoost, SVM, and Random Forest using R and Python packages.
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
  • Worked with BigData Technologies such Hadoop, Hive, MapReduce.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Performed scoring and financial forecasting for collection priorities using Python, R and SAS machine learning algorithms.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Developed a model for predicting repayment of debt owed to small and medium enterprise (SME) businesses.
  • Created SQLscripts and analyzed the data in MS Access/Excel and Worked on SQL and SAS script mapping.
  • Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
  • Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors.
  • Good knowledge in Hadoop Data Lake Implementation/HADOOP Architecture for client business data management.
  • Identifying relevant key performing factors; testing their statistical significance.
  • Above scoring models resulted in millions of dollars of added revenue to the company.

Environment: R, SQL, Python 2.7.x, SQL Server 2014, NLTK,XML, HIVE, HADOOP, GraphLab, No SQL, SAS, SPSS, Spark, Hadoop, Kafka, HBase, MLib.

Data Scientist

Confidential, Boston, MA

Responsibilities:

  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build solution that optimize the quality and performance of data.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SSRS, PL/SQL, T-SQL, Tableau, Cluster analysis, Spark, Kafka, MongoDB, logistic regression, Hadoop, SVM, Tableau, XML, Cassandra, Map Reduce, AWS.

Data Scientist

Confidential, Wichita, KS

Responsibilities:

  • This project was focused on customer segmentation based on machine learning and statistical modeling effort including building predictivemodels and generate data products to support customer segmentation.
  • Used R Programming to visualize the data and implemented machine learning algorithms.
  • Develop a pricing model for various services bundled offering to optimize and predict the gross margin.
  • Built price elasticity model for various product and services bundled offering.
  • Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regressionmodels, neuralnetworks, SVM, to identify Volume using package in R.
  • Performed data imputation using Scikit-learn package in Python.
  • Performed data processing using Python libraries like Numpy and Pandas.
  • Worked with data analysis using ggplot2 library in R to do data visualizations for better understanding of customers' behaviors.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster.
  • Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Performed K-means clustering, Multivariate analysis, and Support Vector Machines in R.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Implemented the presentation layer with HTML, CSS, and JavaScript.
  • Prepared Data Visualization reports for the management using R, Tableau, and Power BI.
  • Work collaboratively throughout the complete analytics including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.
  • Approach analytical problems with an appropriate blend of statistical rigor with practical business intuition.

Environment: R/R studio, SAS, Python, Hive, Hadoop, MS Excel, MS SQL Server, Power BI, Tableau, T-SQL, ETL, MS Access, XML, JSON, MS office 2007, Outlook.

Data Analyst

Confidential, Tampa, FL

Responsibilities:

  • Responsible for the study of SAS Code, SQL Queries, Analysis enhancements and documentation of the system.
  • Used R, SAS, and SQL to manipulate data, and develop and validate quantitative models.
  • Brainstorming sessions and propose hypothesis, approaches, and techniques.
  • Analyzed data collected in stores (JCL jobs, stored-procedures, and queries) and provided reports to the Business team by storing the data in excel/SPSS/SAS file.
  • Performed Analysis and Interpretation of the reports on various findings.
  • Responsible for production support Abend Resolution and other production support activities and comparing the seasonal trends based on the data by Excel.
  • Used advanced Microsoft Excel functions such as pivot tables and VLOOKUPin order to analyze the data.
  • Successfully implemented migration of client's requirement application from Test/DSS/Model regions to production.
  • Prepared SQL scripts for ODBC and Teradata servers for analysis and modeling.
  • Provided complete assistance of the trends of the financial time series data.
  • Various statistical tests performed for clear understanding to the client.
  • Implemented procedures for extracting Excel sheet data into the mainframe environment by connecting to the database using SQL.
  • Complete support to all regions (Test/Model/System/Regression/Production).
  • Actively involved in Analysis, Development, and Unit testing of the data.

Environment: R/R Studio, SQL Enterprise Manager, SAS, Microsoft Excel, Microsoft Access, outlook.

Java Developer

Confidential

Responsibilities:

  • Worked on designing the algorithms, Business components, Java service programs.
  • Involved in the development of UI using HTML, JavaScript, CSS, XML using Spring MVC Framework.
  • Responsible for write the Unit test cases and code coverage for each module.
  • Actively involved in project module estimations and designs.
  • Used JQuery to make the application more interactive and JSON objects effectively for client side coding.
  • Used XML parsing techniques for data handling and JavaScript front end pages.
  • Lead the effort in reconfiguring the project from IntelliJ IDEA towards NetBeans IDE.
  • Written SQL and PL/SQL new stored procedures and modified existing ones depending on the requirements in the MySQL Database.
  • Performed JavaScript validations on the data submitted by the user.
  • Used Spring MVC framework at the front end and implemented modules into Node.JS to integrate with requirements.
  • Provided data persistence by ORM solution via Hibernate for application save, update and delete operations.
  • Used Core Java techniques like Multithreading, Collections, Generics in the development phase.
  • Worked on JPA for persisting the objects into the system.
  • Involved in creating scenarios for performance testing followed up with the team to run the scripts.
  • Developed Unit test cases using JUnit.
  • Deployed the apps on Unix Box and used FileZilla to get the logs from UNIX box.
  • Implemented the middle tier using Spring MVC to process client requests and executed server side code.

Environment: Java 1.5, Spring, Hibernate, HTML, JavaScript, CSS, MySQL, JUnit, Eclipse IDE, XSLT, AJAX, Oracle 10g, XML, PL/SQL, Angular JS, IntelliJ, Node JS, jQuery, JPA.

Jr Programmer

Confidential

Responsibilities:

  • Used a formal Iterative development process from requirements analysis throughdeployment.
  • Used Oracle Sql Developer for Oracle 10g for running queries to confirm the results from theapplication.
  • Development of customized software for the automation of product designing and process planning.
  • Developed Authentication and Authorization modules where authorized persons can only access thesystem.
  • Created use cases, class diagrams, activity diagrams and collaboration diagrams using Rational Rose 2000.
  • Used struts tiles for JSP page layouts&Utilized struts validates for client side validations.
  • Heavily used STRUTS framework for JSP and Servlet development, JMS, JSP, Servlets, and otherJ2EE APIs.
  • Eclipse was the IDE used for java code development.
  • Junit scripts were used for unit testing the modules developed in the development environment.
  • Used Log4j for logging and run time debugging of the application.
  • Utilized Rational Clear Case as a version control system and for code management.

Environment: JDK 1.5, Hibernate, Oracle 10g, Apache Tomcat 5, XML, RUP, Java, HTML, JavaScript, CSS2

We'd love your feedback!