Data Scientist Resume Lowell, Arkansas - Hire IT People

PROFESSIONAL SUMMARY:

Around 14+ years of experience in Machine Learning, Data - mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.
Experience in coding SQL/PL SQL using Procedures, Triggers, and Packages.
Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning)
Professional working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K-Means Clustering and Association Rules.
Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of data.
Experience with data visualization using tools like Ggplot, Matplotlib, Seaborn, Tableau and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms.
Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations.
Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments.
Worked on Text Mining and Sentimental analysis for extracting the unstructured data from various social Media platforms like Facebook, Twitter, and Reddit.
Good Knowledge of NoSQL databases like Mongo DB and HBase.
Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, Scikit Learn, Hadoop Map Reduce.
Expertise in Technical proficiency in Designing, Data Modeling Online Application, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
Cluster Analysis, Principal Component Analysis (PCA), Association Rules, Recommender Systems.
Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
Hands on experience with R-Studio for doing data pre-processing and building machine learning algorithms on different datasets.
Collaborated with the lead Data Architect to model the Data warehouse in accordance with FSLDM subject areas, 3NF format, and Snow-flake schema.
Worked and extracted data from various database sources like Oracle, SQL Server, and DB2.
Predictive Modelling Algorithms: Logistic Regression, Linear Regression, Decision Trees, K-Nearest Neighbors, Bootstrap Aggregation, Naive Bayes Classifier, Random Forests, Boosting, SVM.
Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos5/6, Ubuntu13/14, Cosmos.
Excellent oral and written communication skills. Ability to explain complex technical information to technical and non-technical contacts.
Excellent interpersonal skills. Ability to effectively build relationships, promote a collaborative and team environment, and influence others.

TECHNICAL SKILLS:

Bigdata/Hadoop Technologies: Hadoop, HDFS, YARN, Map Reduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozi .

Languages: HTML5, CSS3, XML, C, C++, DHTML, WSDL, R/R Studio, SAS Enterprise Guide, SAS, R, Perl, MATLAB, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Genism, Keras), SQL, PL/SQL, HiveQL, Java Script, Shell Scripting.

Java & J2EE Technologies: Core Java, JSP, Servlets, JDBC, JAAS, JNDI, Hibernate, Spring, Struts, JMS, EJB, Restful

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Databases: Microsoft SQL Server, MySQL, Oracle, DB2, Teradata, Netezza

NO SQL Databases: HBase, Cassandra, Mongo DB, Maria DB

Build Tools: Jenkins, Maven, ANT, Toad, SQL Loader, RTC, RSA, Control-M, Oozie, Hue, SOAP UI

Business Intelligence Tools: Tableau, Tableau server, Tableau Reader, Splunk, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Red shift, or Azure Data Warehouse

Development and cloud computing tools: Microsoft SQL Studio, Eclipse, Net Beans, IntelliJ, Amazon AWS, Azure

Development Methodologies: Agile/Scrum, Waterfall, UML, Design Patterns

Version Control Tools and Testing: API Git, SVM, GitHub, SVN and JUNIT

ETL Tools: Informatica Power Centre, SSIS

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, Cognos.

Data Modelling Tools: Erwin R, Rational Rose, ER/Studio, MS Visio, Oracle Designer, SAP Power designer, Enterprise Architect.

Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE:

Confidential, Lowell, Arkansas

Data Scientist

Responsibilities:

Utilized Apache Spark with Python to develop and execute BigData Analytics and Machine learning applications, executed machine learning use cases under SparkML and Mllib.
Setup storage and data analysis tools in Amazon Web services cloud computing infrastructure.
Used pandas, NumPy, Seaborn, SciPy, matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms.
Worked on different data formats such as JSON, XML and performed machine learning algorithms.
Worked with Data Architects and IT Architects to understand the movement of data and its storage.
Participated in all phases of data mining, data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Powerball, and Smart View.
Implemented Agile Methodology for building an internal application.
Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
Programmed by a utility in Python that used multiple packages (SciPy, NumPy, pandas)
Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN.
Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.
Participated in all phases of data mining, data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, PowerBI, and Smart View.
Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
Data transformation from various resources, data organization, features extraction from raw and stored.
Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
Data transformation from various resources, data organization, features extraction from raw and stored.
Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.

Environment: R, ODS, OLTP, Bigdata, Oracle 10g, Hive, OLAP, DB2, Metadata, Python, MS Excel, Mainframes MS Vision, Rational Rose, Teradata, DB2, SPSS, T-SQL, PL/SQL, Flat Files, XML, and Tableau.

Confidential, Santa Clara

Data Scientist

Responsibilities:

Performed Data Profiling to learn about behaviour with various features such as traffic pattern, location, and time, Date and Time etc.
Application of various machine learning algorithms and statistical modelling like decision trees, regression models, neural networks , SVM , clustering to identify Volume using Scikit-learn package.
Utilized Spark , Scala , Hadoop , HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for Confidential categories.
Developed Spark / Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabelled data.
Evaluated models using Cross Validation, Log loss function, ROC curves and AUC for feature selection.
Analyse traffic patterns by calculating autocorrelation with different time lags.
Developed entire frontend and backend modules using Python on Django Web Framework.
Implemented the presentation layer with HTML, CSS, and JavaScript.
Involved in writing stored procedures using Oracle .
Addressed over fitting by implementing the algorithm regularization methods like L2 and L1 .
Used Principal Component Analysis in feature engineering to analyse high dimensional data.
Identified and targeted welfare high-risk groups with Machine learning algorithms.
Developed Tableau visualizations and dashboards using Tableau Desktop.
Created clusters to classify Control and test groups and conducted group campaigns.
Developed Linux Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza .
Developed triggers, stored procedures, functions, and packages using cursors and ref cursor concepts associated with the project using PL/SQL.
Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
Performed data analysis by using Hive to retrieve the data from Hadoop cluster , SQL to retrieve data.
Used MLlib, Spark's Machine learning library to build and evaluate different models.
Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
Performed Data Cleaning , features scaling, features engineering using pandas and NumPy packages.
Developed Map Reduce pipeline for feature extraction using Hive .
Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau .
Communicated the results with operations team for taking best decisions.
Collected data needs and requirements by Interacting with the other departments.

Environment: Python, CDH5, HDFS, Hadoop, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Confidential, Minneapolis, MN

Data Scientist

Responsibilities:

Developed applications of Machine Learning, Statistical Analysis and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.
Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.
Used predictive modeling with tools in SAS, SPSS, R, Python.
Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
Created SQL tables with referential integrity and developed queries using SQL, SQL+, PL/SQL.
Involved with DataAnalysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
Prepare ETL architect & design document which covers ETL architect, SSIS design, Extraction, transformation and loading of Duck Creek data into dimensional model.
Applied linear regression, multiple regression, ordinary least square method, mean-variance, theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, NaiveBayes, fitting function etc to data with help of Scikit, Scipy, Numpy and Pandas.
Applied clustering algorithms i.e.Hierarchical, K-means with help of Scikit and Scipy.
Developed visualizations and dashboards using ggplot, Tableau
Worked on development of data warehouse, DataLake and ETL systems using relational and non relational tools like SQL, NoSQL.
Built and analyzed datasets using R, SAS, Matlab and Python (in decreasing order of usage).
Applied linear regression in Python and SAS to understand the relationship between different attributes of dataset and causal relationship between them
Expertise in Business Intelligence and data visualization using R and Tableau.
Validated the Macro-Economic data and predictive analysis of world markets using key indicators in Python and machine learning concepts like regression, Bootstrap Aggregation and Random Forest.
Worked in large scale database environment like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
Interfaced with large scale database system through an ETL server for data extraction.
Identified patterns, data quality issues, and opportunities and leveraged insights by communicating opportunities with business partners.

Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/SciPy/NumPy/Pandas), R, SAS, SPSS, My SQL, Eclipse, PL/SQL, SQL connector, Tableau.

Confidential, Austin, TX

Data Modeler/Data Analyst

Responsibilities:

Communicated effectively in both a verbal and written manner to client team.
Completed documentation on all assigned systems and databases, including business rules, logic.
Created Testdata and TestCases documentation for regression and performance.
Designed, built, and implemented relational databases.
Determined changes in physical database by studying project requirements.
Developed intermediate business knowledge of the functional area and processed to understand the application of data information to support business function.
Facilitated gathering moderately complex business requirements by defining the business problem
Utilized SPSS statistical software to track and analyze data.
Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly.
Used advanced Microsoft Excel to create pivot tables, used VLOOKUP and other Excel functions.
Successfully interpreted data to draw conclusions for managerial action and strategy.
Created Data chart presentations and coded variables from original data, conducted statistical analysis as and when required and provided summaries of analysis.

Environment: Data Analysis, SQL, FTP, SFTP, XML, Web Services

Confidential

Data Analyst

Responsibilities:

Processed data received from vendors and loading them into the database. The process was carried out on weekly basis and reports were delivered on a bi-weekly basis.
Documented requirements and obtained signoffs.
Coordinated between the Business users and development team in resolving issues.
Documented data cleansing and data profiling.
Wrote SQLscripts to meet the business requirement.
Analyzed views and produced reports.
Tested cleansed data for integrity and uniqueness.
Automated the existing system to achieve faster and accurate data loading.
Learned to create Business Process Models.
Ability to manage multiple projects simultaneously tracking them towards varying timelines effectively through a combination of business and technical skills.
Good Understanding of clinical practice management, medical and laboratory billing and insurance claim with processing with process flow diagrams.
Assisted QA team in creating test scenarios that cover a day in a life of the patient for Inpatient and Ambulatory workflows.

Environment: : SQL, data profiling, data loading, QA team.

Confidential

Data Analyst

Responsibilities:

Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
Worked with other teams to analyze customers to analyze parameters of marketing.
Conducted Design reviews and Technical reviews with other project stakeholders.
Was a part of the complete life cycle of the project from the requirements to the production support
Created test plan documents for all back-end database modules
Used MS Excel, MS Access and SQL to write and run various queries.
Used traceabilitymatrix to trace the requirements of the organization.
Recommended structural changes and enhancements to systems and databases.
Conducted Design reviews and Technical reviews with other project stakeholders.
Maintenance in the testing team for System testing/Integration/UAT
Guaranteeing quality in the deliverables.

Environment: : UNIX, SQL, Oracle 10g, MS Office, MS Visio.

We provide IT Staff Augmentation Services!

Data Scientist Resume

Lowell, ArkansaS

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship