We provide IT Staff Augmentation Services!

Data Scientist Resume

Lowell, ArkansaS


  • Above 8+ years of experience in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing datamining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Designing of Physical Data Architecture of New system engines.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, DecisionTrees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Developing Logical Data Architecture with adherence to Enterprise Architecture.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Adept in statistical programming languages like R and also Python including BigData technologies like Hadoop, Hive.
  • Skilled in using dplyr and pandas in R and python for performing exploratory data analysis.
  • Experience working with data modeling tools like Erwin, Power Designer and ERStudio.
  • Experience in designing starschema, Snowflakeschema for Data Warehouse, ODSarchitecture.
  • Experience and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Good understanding of TeradataSQLAssistant, TeradataAdministrator and data load/ export utilities like BTEQ, FastLoad, MultiLoad, and FastExport.
  • Experience with Data Analytics, DataReporting, Ad-hocReporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
  • Worked and extracted data from various database sources like Oracle, SQLServer, DB2, Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, DatabaseDesign and implementing RDBMS specific features.
  • Knowledge of working with Proof of Concepts (PoC’s) and gapanalysis and gathered necessary data for analysis from different sources, prepared data for data exploration using datamunging and Teradata.
  • Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.


Languages: C, C++, Java 8, Python, R

Packages: ggplot2, caret, dplyr, RWeka, gmodels, RCurl, Twitter, NLP, Reshape2, rjson, dplyr, pandas, NumPy, Seaborn, SciPy, Matplot lib, Scikit-learn, Beautiful Soup, Rpy2.

WebTechnologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

DataModellingTools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer, Text Mining, and Google Cloud Vision.

BigDataTechnologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra.

ReportingTools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal Reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, and Cognos7.0/6.0.

ETLTools: Informatica Power Centre, SSIS.

VersionControlTools: SVN, GitHub.

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

BITools: Tableau, Tableau Server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

OperatingSystem: Windows, Linux, UNIX, Macintosh HD, Red Hat.

Others: MicroStrategy, Tensor flow, QlikView Data Models, OLAP databases/cubes, scorecards, dashboards, reports, MongoDB 3.3, statistical machine learning, and Python.


Confidential, Lowell, Arkansas

Data Scientist


  • Built models using Statistical techniques like Bayesian HMM and MachineLearning classification models like XGBoost, SVM, and Random Forest.
  • A highly immersive DataScience program involving Data Manipulation Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Setup storage and dataanalysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio 9.7
  • Participated in all phases of datamining; datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decisiontrees, KNN, NaiveBayes.
  • Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like HadoopMapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Pythonscripts to match training data with our database stored in AWSCloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Extracted data from HDFS and prepared data for exploratory analysis using datamunging.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce.

Confidential, New York, NY

Data Scientist


  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Performed Exploratory Data Analysis and Data Visualizations using R and Tableau.
  • Used Kibana an open source plugin for Elasticsearch for Analytics and Data Visualization.
  • Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MS Visio.
  • Research on improving IVR used internally in Confidential .
  • Developing IVR For clinics so that the callers can receive anonymous access to test results.
  • Performed data cleaning and imputation of missing values using R.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce
  • Use of ApacheKafka and Flink.
  • Use of Thrift for Cross platform integration.
  • Take up ad-hoc requests based on different departments and locations.
  • Determined regression model predictors using Correlation matrix for Factor analysis in R
  • Built Regression model to understand order fulfillment time lag issue using Scikit-learn in Python
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming,MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc
  • Empowered decision makers with data analysis dashboards using Tableau and Power BI
  • Interface with other technology teams to extract, transform, and load (ETL) data from a wide variety of data sources
  • Own the functional and non-functional scaling of software systems in your ownership area.
  • Provides input and recommendations on technical issues to BI Engineers, Business & DataAnalysts and Data Scientists.
  • As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
  • Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for Forward/Reverse Engineered Databases.
  • Established Data architecture strategy, best practices, standards and roadmaps.
  • Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential, Denver, CO

Data Analyst


  • Develop complex stored procedures, triggers, functions, indexes, tables, views, and other SQL joins for applications while implementing various types of constraints on tables
  • Use pivot tables, VLOOKUP, data validation, conditional formatting as well as graph and chart manipulation for extensive data cleaning and analysis
  • Function as dual validator on multiple business projects which require dual data validation and data consistency
  • Contribute in special projects by gathering and organizing data to monitor and report escalation issues to the management
  • Efficiently handle inventory of laws, rules, and regulations related to technology, which include the following:
  • Development of monitoring and testing coverage plans;
  • Planning and interpretation ofmonitoring and testing activities to evaluate control effectiveness; and − Identification and resolution of issues through compliance risk assessments, activity monitoring, and testing
  • Researched source data and designed queries and brought data into Tableau. Interacted with the clients throughout the development process
  • Contributed to and helped the team define best practices around report development, report scheduling
  • Speahead global technology compliance risks with regional technology compliance officers in charge of regional execution and technology compliance risk coverage
  • Employ technical skills in writing T-SQL scripts to manipulate data loads and extracts; and SAS programs to perform, ad hoc analysis, data manipulation, and SAS macros to improve reporting
  • Create high performance data integration solutions such as load (ETL) packages using SSIS
  • Expertly draft various Teradata SQL queries though the development of SET or MULTISET tables, views, and volatile tables using inner and outer jones, string function, and tecniques such as row number through TeradataSQL assistant for data pull requests.

Confidential, Virginia, McLean

Data Architect/ Data Modeler


  • Worked with BI team in gathering the report requirements and Sqoop to export data into HDFS and Hive
  • Worked on MapReduce jobs in Java for data cleaning and pre-processing.
  • Assisted with data capacity planning and node forecasting.
  • Collaborated with the infrastructure, network, database, application and BIteams to ensure data quality and availability.
  • Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
  • Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume
  • Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data
  • Using HiveQL developed many queries and extracted the required information.
  • Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Worked on MongoDBdatabase concepts such as locking, transactions, indexes, Sharing, replication, schema design, etc.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Involved in defining the source to target data mappings, business rules, business and data definitions
  • Responsible for defining the key identifiers for each mapping/interface
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Responsible for defining the key identifiers for each mapping/interface
  • Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures.
  • Involved in the Database Designing (Relational and Dimensional models) using Erwin.

Environment: SQL/Server, Oracle 9i, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, Help-Point Claims Services.


Data Analyst


  • Performed data profiling in the source systems that are required for New Customer Engagement (NCE) Datamart.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Manipulating, cleansing & processing data using Excel, Access and SQL.
  • Responsible for loading, extracting and validation of client data.
  • Liaising with end-users and 3rd party suppliers.
  • Analyzing raw data, drawing conclusions & developing recommendations writing SQLscripts to manipulate data for data loads and extracts.
  • Developing data analytical databases from complex financial source data. Performing daily system checks. Data entry, data auditing, creating data reports & monitoring all data for accuracy. Designing, developing and implementing new functionality.
  • Monitoring the automated loading processes. Advising on the suitability of methodologies and suggesting improvements.
  • Involved in defining the source to target data mappings, business rules, and business and data definitions.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team. Reverse engineered all the Source Database's using Embarcadero.
  • Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
  • Document data quality and traceability documents for each source interface.
  • Designed and implemented data integration modules for Extract/Transform/Load (ETL) functions.
  • Involved in Datawarehouse and Datamartdesign. Experience with various ETL, datawarehousing tools and concepts.
  • Documented the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Worked with project team representatives to ensure that logical and physicalER/Studiodatamodels were developed in line with corporate standards and guidelines.

Environment: SQL/Server, Oracle10 &11g, MS-Office, Netezza, Teradata, Enterprise Architect, Informatica Data Quality, ER Studio, TOAD, Business Objects, Green plum Database, PL/SQL.


Database Analyst


  • Installation and configuration of Oracle Database Software10g on Red Hat Linux, HP UNIX, Windows NT.
  • Involved in creation of different data base objects like Tables, Triggers, Indexes, Views, Constrain, and Stored Procedure, Materialized Views using PL/SQL and SQL and also performed table partition of the table space on different schemas
  • Integrated the Backup Status Reporting for all the databases in the enterprise to enable the web reporting using Oracle SQL, PL/SQL and Shell Scripting.
  • Performed Migration on database, schemas and table basis using SQL loader and utilities like Import and Export dump, Transportable Table Spaces features of Oracle database 10g.
  • Extensively involved in the Dimension building and report generation using Hyperion 9.0 reporting tool
  • Installation, Configuration and Administration of Oracleapplication servers (OAS), Windows app Servers, Web Server like Web logic, Web sphere , J2EE and Apache HTTPD
  • Performance tuning of application by identifying the SQL, which are not written well; providing inputs to the application programmer; correcting and implementing the right components.
  • Worked extensively on Unix shell Scripting, Korn Shell, AWK, CVS
  • Worked extensively with JAVA application development environment like Java to Database connectivity using JDBC, ODBC, Installation of Tomcat Server, J2EE and ASP.NET
  • Involved in Data migration from SQL Server, DB2 to Oracle Data base
  • Performance tuning of application using OEM, Query Optimizer, Toad to reduce execution time of the query and check joins used by queries.
  • Worked on Oracle E-Business Suite R12 to generate various financial reports like Account Payable, Account Receivable.
  • Communicating with Oracle to get the necessary patches required to fix up the database changes needed using TAR support and coordinating the patches and implementing the necessary changes.
  • Worked on SQL Server 2005, 2000, MS SQL, IBM DB2 database for development /application of web applications and performance tuning using Transact - T SQL.
  • Performance tuning for Oracle Day-to-day maintenance and management on various types of databases like in development, test, regression and production environments; proactively monitoring production environment for any exceptions/ issues.
  • Identifying issues/ bottlenecks that could potentially cause issues, involving the right teams (such as server administrators or application developers) and take a corrective action to rectify the issues like error using trouble ticketsystem

Environment: Oracle 10g, SQL Server, DB2, OEM, RMAN, Hyperion, Rad Hat RHEL -5 /HP UX 11i V3, Windows, UNIX Shell Programming, AWK, Perl, SQL, PL-SQL, T -SQL, Toad 9.5, Erwin.

Hire Now