We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Lowell, ArkansaS

PROFESSIONAL SUMMARY:

  • Around 14+ years of experience in Machine Learning, Data - mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.
  • Experience in coding SQL/PL SQL using Procedures, Triggers, and Packages.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning)
  • Professional working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K-Means Clustering and Association Rules.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of data.
  • Experience with data visualization using tools like Ggplot, Matplotlib, Seaborn, Tableau and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms.
  • Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations.
  • Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments.
  • Worked on Text Mining and Sentimental analysis for extracting the unstructured data from various social Media platforms like Facebook, Twitter, and Reddit.
  • Good Knowledge of NoSQL databases like Mongo DB and HBase.
  • Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, Scikit Learn, Hadoop Map Reduce.
  • Expertise in Technical proficiency in Designing, Data Modeling Online Application, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Cluster Analysis, Principal Component Analysis (PCA), Association Rules, Recommender Systems.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Hands on experience with R-Studio for doing data pre-processing and building machine learning algorithms on different datasets.
  • Collaborated with the lead Data Architect to model the Data warehouse in accordance with FSLDM subject areas, 3NF format, and Snow-flake schema.
  • Worked and extracted data from various database sources like Oracle, SQL Server, and DB2.
  • Predictive Modelling Algorithms: Logistic Regression, Linear Regression, Decision Trees, K-Nearest Neighbors, Bootstrap Aggregation, Naive Bayes Classifier, Random Forests, Boosting, SVM.
  • Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos5/6, Ubuntu13/14, Cosmos.
  • Excellent oral and written communication skills. Ability to explain complex technical information to technical and non-technical contacts.
  • Excellent interpersonal skills. Ability to effectively build relationships, promote a collaborative and team environment, and influence others.

TECHNICAL SKILLS:

Bigdata/Hadoop Technologies: Hadoop, HDFS, YARN, Map Reduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozi .

Languages: HTML5, CSS3, XML, C, C++, DHTML, WSDL, R/R Studio, SAS Enterprise Guide, SAS, R, Perl, MATLAB, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Genism, Keras), SQL, PL/SQL, HiveQL, Java Script, Shell Scripting.

Java & J2EE Technologies: Core Java, JSP, Servlets, JDBC, JAAS, JNDI, Hibernate, Spring, Struts, JMS, EJB, Restful

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Databases: Microsoft SQL Server, MySQL, Oracle, DB2, Teradata, Netezza

NO SQL Databases: HBase, Cassandra, Mongo DB, Maria DB

Build Tools: Jenkins, Maven, ANT, Toad, SQL Loader, RTC, RSA, Control-M, Oozie, Hue, SOAP UI

Business Intelligence Tools: Tableau, Tableau server, Tableau Reader, Splunk, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Red shift, or Azure Data Warehouse

Development and cloud computing tools: Microsoft SQL Studio, Eclipse, Net Beans, IntelliJ, Amazon AWS, Azure

Development Methodologies: Agile/Scrum, Waterfall, UML, Design Patterns

Version Control Tools and Testing: API Git, SVM, GitHub, SVN and JUNIT

ETL Tools: Informatica Power Centre, SSIS

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, Cognos.

Data Modelling Tools: Erwin R, Rational Rose, ER/Studio, MS Visio, Oracle Designer, SAP Power designer, Enterprise Architect.

Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE:

Confidential, Lowell, Arkansas

Data Scientist

Responsibilities:

  • Utilized Apache Spark with Python to develop and execute BigData Analytics and Machine learning applications, executed machine learning use cases under SparkML and Mllib.
  • Setup storage and data analysis tools in Amazon Web services cloud computing infrastructure.
  • Used pandas, NumPy, Seaborn, SciPy, matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms.
  • Worked with Data Architects and IT Architects to understand the movement of data and its storage.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Powerball, and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed by a utility in Python that used multiple packages (SciPy, NumPy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN.
  • Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, PowerBI, and Smart View.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.

Environment: R, ODS, OLTP, Bigdata, Oracle 10g, Hive, OLAP, DB2, Metadata, Python, MS Excel, Mainframes MS Vision, Rational Rose, Teradata, DB2, SPSS, T-SQL, PL/SQL, Flat Files, XML, and Tableau.

Confidential, Santa Clara

Data Scientist

Responsibilities:

  • Performed Data Profiling to learn about behaviour with various features such as traffic pattern, location, and time, Date and Time etc.
  • Application of various machine learning algorithms and statistical modelling like decision trees, regression models, neural networks , SVM , clustering to identify Volume using Scikit-learn package.
  • Utilized Spark , Scala , Hadoop , HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for Confidential categories.
  • Developed Spark / Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabelled data.
  • Evaluated models using Cross Validation, Log loss function, ROC curves and AUC for feature selection.
  • Analyse traffic patterns by calculating autocorrelation with different time lags.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Implemented the presentation layer with HTML, CSS, and JavaScript.
  • Involved in writing stored procedures using Oracle .
  • Addressed over fitting by implementing the algorithm regularization methods like L2 and L1 .
  • Used Principal Component Analysis in feature engineering to analyse high dimensional data.
  • Identified and targeted welfare high-risk groups with Machine learning algorithms.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Created clusters to classify Control and test groups and conducted group campaigns.
  • Developed Linux Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza .
  • Developed triggers, stored procedures, functions, and packages using cursors and ref cursor concepts associated with the project using PL/SQL.
  • Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster , SQL to retrieve data.
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Performed Data Cleaning , features scaling, features engineering using pandas and NumPy packages.
  • Developed Map Reduce pipeline for feature extraction using Hive .
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau .
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: Python, CDH5, HDFS, Hadoop, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Confidential, Minneapolis, MN

Data Scientist

Responsibilities:

  • Developed applications of Machine Learning, Statistical Analysis and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.
  • Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.
  • Used predictive modeling with tools in SAS, SPSS, R, Python.
  • Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL+, PL/SQL.
  • Involved with DataAnalysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • Prepare ETL architect & design document which covers ETL architect, SSIS design, Extraction, transformation and loading of Duck Creek data into dimensional model.
  • Applied linear regression, multiple regression, ordinary least square method, mean-variance, theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, NaiveBayes, fitting function etc to data with help of Scikit, Scipy, Numpy and Pandas.
  • Applied clustering algorithms i.e.Hierarchical, K-means with help of Scikit and Scipy.
  • Developed visualizations and dashboards using ggplot, Tableau
  • Worked on development of data warehouse, DataLake and ETL systems using relational and non relational tools like SQL, NoSQL.
  • Built and analyzed datasets using R, SAS, Matlab and Python (in decreasing order of usage).
  • Applied linear regression in Python and SAS to understand the relationship between different attributes of dataset and causal relationship between them
  • Expertise in Business Intelligence and data visualization using R and Tableau.
  • Validated the Macro-Economic data and predictive analysis of world markets using key indicators in Python and machine learning concepts like regression, Bootstrap Aggregation and Random Forest.
  • Worked in large scale database environment like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
  • Interfaced with large scale database system through an ETL server for data extraction.
  • Identified patterns, data quality issues, and opportunities and leveraged insights by communicating opportunities with business partners.

Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/SciPy/NumPy/Pandas), R, SAS, SPSS, My SQL, Eclipse, PL/SQL, SQL connector, Tableau.

Confidential, Austin, TX

Data Modeler/Data Analyst

Responsibilities:

  • Communicated effectively in both a verbal and written manner to client team.
  • Completed documentation on all assigned systems and databases, including business rules, logic.
  • Created Testdata and TestCases documentation for regression and performance.
  • Designed, built, and implemented relational databases.
  • Determined changes in physical database by studying project requirements.
  • Developed intermediate business knowledge of the functional area and processed to understand the application of data information to support business function.
  • Facilitated gathering moderately complex business requirements by defining the business problem
  • Utilized SPSS statistical software to track and analyze data.
  • Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly.
  • Used advanced Microsoft Excel to create pivot tables, used VLOOKUP and other Excel functions.
  • Successfully interpreted data to draw conclusions for managerial action and strategy.
  • Created Data chart presentations and coded variables from original data, conducted statistical analysis as and when required and provided summaries of analysis.

Environment: Data Analysis, SQL, FTP, SFTP, XML, Web Services

Confidential

Data Analyst

Responsibilities:

  • Processed data received from vendors and loading them into the database. The process was carried out on weekly basis and reports were delivered on a bi-weekly basis.
  • Documented requirements and obtained signoffs.
  • Coordinated between the Business users and development team in resolving issues.
  • Documented data cleansing and data profiling.
  • Wrote SQLscripts to meet the business requirement.
  • Analyzed views and produced reports.
  • Tested cleansed data for integrity and uniqueness.
  • Automated the existing system to achieve faster and accurate data loading.
  • Learned to create Business Process Models.
  • Ability to manage multiple projects simultaneously tracking them towards varying timelines effectively through a combination of business and technical skills.
  • Good Understanding of clinical practice management, medical and laboratory billing and insurance claim with processing with process flow diagrams.
  • Assisted QA team in creating test scenarios that cover a day in a life of the patient for Inpatient and Ambulatory workflows.

Environment: : SQL, data profiling, data loading, QA team.

Confidential

Data Analyst

Responsibilities:

  • Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Worked with other teams to analyze customers to analyze parameters of marketing.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support
  • Created test plan documents for all back-end database modules
  • Used MS Excel, MS Access and SQL to write and run various queries.
  • Used traceabilitymatrix to trace the requirements of the organization.
  • Recommended structural changes and enhancements to systems and databases.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Maintenance in the testing team for System testing/Integration/UAT
  • Guaranteeing quality in the deliverables.

Environment: : UNIX, SQL, Oracle 10g, MS Office, MS Visio.

We'd love your feedback!