Data Scientist Resume
Lowell, ArkansaS
PROFESSIONAL SUMMARY:
- Around 14+ years of experience in Machine Learning, Data - mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.
- Experience in coding SQL/PL SQL using Procedures, Triggers, and Packages.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
- Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning)
- Professional working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K-Means Clustering and Association Rules.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of data.
- Experience with data visualization using tools like Ggplot, Matplotlib, Seaborn, Tableau and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms.
- Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations.
- Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments.
- Worked on Text Mining and Sentimental analysis for extracting the unstructured data from various social Media platforms like Facebook, Twitter, and Reddit.
- Good Knowledge of NoSQL databases like Mongo DB and HBase.
- Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, Scikit Learn, Hadoop Map Reduce.
- Expertise in Technical proficiency in Designing, Data Modeling Online Application, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Cluster Analysis, Principal Component Analysis (PCA), Association Rules, Recommender Systems.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Hands on experience with R-Studio for doing data pre-processing and building machine learning algorithms on different datasets.
- Collaborated with the lead Data Architect to model the Data warehouse in accordance with FSLDM subject areas, 3NF format, and Snow-flake schema.
- Worked and extracted data from various database sources like Oracle, SQL Server, and DB2.
- Predictive Modelling Algorithms: Logistic Regression, Linear Regression, Decision Trees, K-Nearest Neighbors, Bootstrap Aggregation, Naive Bayes Classifier, Random Forests, Boosting, SVM.
- Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos5/6, Ubuntu13/14, Cosmos.
- Excellent oral and written communication skills. Ability to explain complex technical information to technical and non-technical contacts.
- Excellent interpersonal skills. Ability to effectively build relationships, promote a collaborative and team environment, and influence others.
TECHNICAL SKILLS:
Bigdata/Hadoop Technologies: Hadoop, HDFS, YARN, Map Reduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozi .
Languages: HTML5, CSS3, XML, C, C++, DHTML, WSDL, R/R Studio, SAS Enterprise Guide, SAS, R, Perl, MATLAB, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Genism, Keras), SQL, PL/SQL, HiveQL, Java Script, Shell Scripting.
Java & J2EE Technologies: Core Java, JSP, Servlets, JDBC, JAAS, JNDI, Hibernate, Spring, Struts, JMS, EJB, Restful
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.
Databases: Microsoft SQL Server, MySQL, Oracle, DB2, Teradata, Netezza
NO SQL Databases: HBase, Cassandra, Mongo DB, Maria DB
Build Tools: Jenkins, Maven, ANT, Toad, SQL Loader, RTC, RSA, Control-M, Oozie, Hue, SOAP UI
Business Intelligence Tools: Tableau, Tableau server, Tableau Reader, Splunk, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Red shift, or Azure Data Warehouse
Development and cloud computing tools: Microsoft SQL Studio, Eclipse, Net Beans, IntelliJ, Amazon AWS, Azure
Development Methodologies: Agile/Scrum, Waterfall, UML, Design Patterns
Version Control Tools and Testing: API Git, SVM, GitHub, SVN and JUNIT
ETL Tools: Informatica Power Centre, SSIS
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, Cognos.
Data Modelling Tools: Erwin R, Rational Rose, ER/Studio, MS Visio, Oracle Designer, SAP Power designer, Enterprise Architect.
Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris
PROFESSIONAL EXPERIENCE:
Confidential, Lowell, Arkansas
Data Scientist
Responsibilities:
- Utilized Apache Spark with Python to develop and execute BigData Analytics and Machine learning applications, executed machine learning use cases under SparkML and Mllib.
- Setup storage and data analysis tools in Amazon Web services cloud computing infrastructure.
- Used pandas, NumPy, Seaborn, SciPy, matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms.
- Worked with Data Architects and IT Architects to understand the movement of data and its storage.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
- Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Powerball, and Smart View.
- Implemented Agile Methodology for building an internal application.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
- As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
- Programmed by a utility in Python that used multiple packages (SciPy, NumPy, pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN.
- Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
- Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, PowerBI, and Smart View.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
Environment: R, ODS, OLTP, Bigdata, Oracle 10g, Hive, OLAP, DB2, Metadata, Python, MS Excel, Mainframes MS Vision, Rational Rose, Teradata, DB2, SPSS, T-SQL, PL/SQL, Flat Files, XML, and Tableau.
Confidential, Santa Clara
Data Scientist
Responsibilities:
- Performed Data Profiling to learn about behaviour with various features such as traffic pattern, location, and time, Date and Time etc.
- Application of various machine learning algorithms and statistical modelling like decision trees, regression models, neural networks , SVM , clustering to identify Volume using Scikit-learn package.
- Utilized Spark , Scala , Hadoop , HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for Confidential categories.
- Developed Spark / Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabelled data.
- Evaluated models using Cross Validation, Log loss function, ROC curves and AUC for feature selection.
- Analyse traffic patterns by calculating autocorrelation with different time lags.
- Developed entire frontend and backend modules using Python on Django Web Framework.
- Implemented the presentation layer with HTML, CSS, and JavaScript.
- Involved in writing stored procedures using Oracle .
- Addressed over fitting by implementing the algorithm regularization methods like L2 and L1 .
- Used Principal Component Analysis in feature engineering to analyse high dimensional data.
- Identified and targeted welfare high-risk groups with Machine learning algorithms.
- Developed Tableau visualizations and dashboards using Tableau Desktop.
- Created clusters to classify Control and test groups and conducted group campaigns.
- Developed Linux Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza .
- Developed triggers, stored procedures, functions, and packages using cursors and ref cursor concepts associated with the project using PL/SQL.
- Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster , SQL to retrieve data.
- Used MLlib, Spark's Machine learning library to build and evaluate different models.
- Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
- Performed Data Cleaning , features scaling, features engineering using pandas and NumPy packages.
- Developed Map Reduce pipeline for feature extraction using Hive .
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau .
- Communicated the results with operations team for taking best decisions.
- Collected data needs and requirements by Interacting with the other departments.
Environment: Python, CDH5, HDFS, Hadoop, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.
Confidential, Minneapolis, MN
Data Scientist
Responsibilities:
- Developed applications of Machine Learning, Statistical Analysis and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.
- Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.
- Used predictive modeling with tools in SAS, SPSS, R, Python.
- Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
- Created SQL tables with referential integrity and developed queries using SQL, SQL+, PL/SQL.
- Involved with DataAnalysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Prepare ETL architect & design document which covers ETL architect, SSIS design, Extraction, transformation and loading of Duck Creek data into dimensional model.
- Applied linear regression, multiple regression, ordinary least square method, mean-variance, theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, NaiveBayes, fitting function etc to data with help of Scikit, Scipy, Numpy and Pandas.
- Applied clustering algorithms i.e.Hierarchical, K-means with help of Scikit and Scipy.
- Developed visualizations and dashboards using ggplot, Tableau
- Worked on development of data warehouse, DataLake and ETL systems using relational and non relational tools like SQL, NoSQL.
- Built and analyzed datasets using R, SAS, Matlab and Python (in decreasing order of usage).
- Applied linear regression in Python and SAS to understand the relationship between different attributes of dataset and causal relationship between them
- Expertise in Business Intelligence and data visualization using R and Tableau.
- Validated the Macro-Economic data and predictive analysis of world markets using key indicators in Python and machine learning concepts like regression, Bootstrap Aggregation and Random Forest.
- Worked in large scale database environment like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
- Interfaced with large scale database system through an ETL server for data extraction.
- Identified patterns, data quality issues, and opportunities and leveraged insights by communicating opportunities with business partners.
Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/SciPy/NumPy/Pandas), R, SAS, SPSS, My SQL, Eclipse, PL/SQL, SQL connector, Tableau.
Confidential, Austin, TX
Data Modeler/Data Analyst
Responsibilities:
- Communicated effectively in both a verbal and written manner to client team.
- Completed documentation on all assigned systems and databases, including business rules, logic.
- Created Testdata and TestCases documentation for regression and performance.
- Designed, built, and implemented relational databases.
- Determined changes in physical database by studying project requirements.
- Developed intermediate business knowledge of the functional area and processed to understand the application of data information to support business function.
- Facilitated gathering moderately complex business requirements by defining the business problem
- Utilized SPSS statistical software to track and analyze data.
- Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly.
- Used advanced Microsoft Excel to create pivot tables, used VLOOKUP and other Excel functions.
- Successfully interpreted data to draw conclusions for managerial action and strategy.
- Created Data chart presentations and coded variables from original data, conducted statistical analysis as and when required and provided summaries of analysis.
Environment: Data Analysis, SQL, FTP, SFTP, XML, Web Services
Confidential
Data Analyst
Responsibilities:
- Processed data received from vendors and loading them into the database. The process was carried out on weekly basis and reports were delivered on a bi-weekly basis.
- Documented requirements and obtained signoffs.
- Coordinated between the Business users and development team in resolving issues.
- Documented data cleansing and data profiling.
- Wrote SQLscripts to meet the business requirement.
- Analyzed views and produced reports.
- Tested cleansed data for integrity and uniqueness.
- Automated the existing system to achieve faster and accurate data loading.
- Learned to create Business Process Models.
- Ability to manage multiple projects simultaneously tracking them towards varying timelines effectively through a combination of business and technical skills.
- Good Understanding of clinical practice management, medical and laboratory billing and insurance claim with processing with process flow diagrams.
- Assisted QA team in creating test scenarios that cover a day in a life of the patient for Inpatient and Ambulatory workflows.
Environment: : SQL, data profiling, data loading, QA team.
Confidential
Data Analyst
Responsibilities:
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
- Worked with other teams to analyze customers to analyze parameters of marketing.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Was a part of the complete life cycle of the project from the requirements to the production support
- Created test plan documents for all back-end database modules
- Used MS Excel, MS Access and SQL to write and run various queries.
- Used traceabilitymatrix to trace the requirements of the organization.
- Recommended structural changes and enhancements to systems and databases.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Maintenance in the testing team for System testing/Integration/UAT
- Guaranteeing quality in the deliverables.
Environment: : UNIX, SQL, Oracle 10g, MS Office, MS Visio.