We provide IT Staff Augmentation Services!

Data Scientist Resume

Portsmouth, NH


  • Around 8+ years of experience in IT and 5+ years experience in Datascientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions.
  • Extensive experience in TextAnalytics, developing different StatisticalMachineLearning, DataMining solutions to various business problems and generating data visualizations using R, Python,andTableau.
  • Expertise in transforming business requirements into analyticalmodels, designingalgorithms, buildingmodels, developing data mining and reporting solutions that scale across amassive volume of structured and unstructured data.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Designing of PhysicalDataArchitecture of New system engines.
  • Hands on experience in implementing LDA, NaiveBayes and skilled in RandomForests, DecisionTrees, Linear and LogisticRegression, SVM, Clustering, neuralnetworks, PrincipleComponentAnalysis and good knowledge on Recommender Systems.
  • Proficient in StatisticalModeling and MachineLearning techniques (Linear, Logistics, DecisionTrees, RandomForest, SVM, K-NearestNeighbors, Bayesian, XGBoost) in Forecasting/ PredictiveAnalytics, Segmentationmethodologies, Regression-basedmodels, Hypothesistesting, Factoranalysis/ PCA, Ensembles.
  • Worked and extracted data from various database sources like Oracle, SQLServer, DB2, and Teradata.
  • Well experienced in Normalization&De-Normalizationtechniques for optimum performance in relational and dimensional database environments.
  • Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Skilled in SystemAnalysis, E-R/DimensionalDataModeling, DatabaseDesign and implementingRDBMS specific features.
  • Expertise in all aspects of SoftwareDevelopmentLifeCycle (SDLC) from requirement analysis, Design, DevelopmentCoding, Testing, Implementation,andMaintenance.
  • Hand on working experience in machine learning and statistics to draw meaningful insights from data. I am good at communication and storytelling with data.
  • Utilize analytical applications/libraries like Plotly, D3JS,andTableau to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into marketing strategies that drive value.
  • Experienced in working with enterprise search platform like ApacheSolr and distributed real-time processing system like Storm.
  • Hands on experience on SparkMlib utilities such as classification, regression, clustering, collaborativefiltering, dimensionalityreductions
  • Extensive experience in Text Analytics, developing different StatisticalMachineLearning, DataMining solutions to various business problems and generating data visualizations using R, Python,andTableau.
  • Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiment), machine learning, algorithms, data structures and data infrastructure.
  • Proficient in StatisticalModeling and MachineLearningtechniques (Linear, Logistics, DecisionTrees, RandomForest, SVM, K-NearestNeighbors) in Forecasting/PredictiveAnalytics, Segmentationmethodologies, Regression-basedmodels, Hypothesis testing, Factoranalysis/ PCA, Ensemble.
  • Solid team player, team builder, and an excellent communicator.
  • Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, ScikitLearn, HadoopMapReduce
  • Expertise in Technical proficiency in Designing, DataModelingOnlineApplications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Expertise in the implementation of Core concepts of Java, JEETechnologies, JSP, Servlets, JSTL, EJB, JMS, Struts, Spring, Hibernate, JDBC, XML, WebServices, and JNDI.
  • Extensive experience working in a Test-DrivenDevelopment and Agile-ScrumDevelopment.
  • Experience in working on both windows, Linux and UNIX platforms including programming and debugging skills in UNIX Shell Scripting.
  • Flexible with Unix/Linux and WindowsEnvironments, working with OperatingSystems like Centos5/6, Ubuntu13/14, Cosmos.
  • Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing.
  • Experience in Datamigration from existing data stores to Hadoop.
  • Developed MapReduce programs to perform DataTransformation and analysis.


Languages: C, C++, Python, T-SQL, PL/SQL, SQL, XML, HTML, DHTML, HTTP, Matlab, DAX.

Databases: SQL Server, MS-Access, Oracle 11g/10g/9i and Teradata, big data, Hadoop

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: ERWIN 4.5/4.0, MS Visio, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity,Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA


Confidential, Portsmouth, NH

Data Scientist


  • Setup storage and data analysis tools in AmazonWebServices cloud computing infrastructure.
  • Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used CaffeDeepLearning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as DataArchitects and ITArchitects to understand the movement of data and its storage and ER Studio 9.7
  • Participated in all phases of datamining; datacollection, datacleaning, developingmodels, validation, visualization and performedGapanalysis.
  • Data Manipulation and Aggregation from adifferent source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
  • Implemented AgileMethodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, KNN, NaiveBayes.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Interaction with BusinessAnalyst, SMEs,and other DataArchitects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to built sustainable Big Data platforms for the clients
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, BusinessObjects.
  • Designed both 3NF data models for ODS, OLTP systems and dimensionaldatamodels using Star and SnowflakeSchemas.

Environment: R 9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Rational Rose.

Confidential, CA

Data Scientist


  • Analyzed the business requirements of the project by studying the BusinessRequirementSpecification document.
  • Extensively worked on DataModeling tools ErwinDataModeler to design the datamodels.
  • Designed amapping to process the incremental changes that exist in the source table. Whenever source data elements were missing in source tables, these were modified/added inconsistency with third normal form based OLTP source database.
  • Designed tables and implemented the naming conventions for Logical and PhysicalDataModels in Erwin7.0.
  • Participated in theconversion of ITS (ImmigrationTrackingSystem) VisualBasicclient-server application into C#, ASP.NET3-tierIntranet application.
  • Performed ExploratoryDataAnalysis and DataVisualizations using R, and Tableau.
  • Perform a proper EDA, Uni-variate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Worked with DataGovernance, Dataquality, datalineage, Dataarchitect to design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MSVisio.
  • Utilized ADO.Net Object Model to implement middle-tier components that interacted with MSSQL Server 2000database.
  • Participated in AMS (AlertManagementSystem) JAVA and SYBASE project. Designed SYBASE database utilizing ERWIN. Customized error messages utilizing SP ADDMESSAGE and SP BINDMSG. Created indexes, made query optimizations. Wrote stored procedures, triggers utilizing T-SQL.
  • Explained the data model to the other members of thedevelopment team. Wrote XML parsing module that populates alerts from theXML file into the database tables utilizing JAVA, JDBC, BEAWEBLOGICIDE, DocumentObjectModel.
  • As an Architect implemented MDMhub to provide clean, consistent data for anSOA implementation.
  • Developed, Implemented & Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for forwarding/ReverseEngineeredDatabases.
  • Explored and Extracted data from source XML in HDFS, preparing data for exploratory analysis using datamunging.

Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Hadoop, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.

Confidential, South Portland

Data Scientist


  • Coded R functions to interface with CaffeDeepLearningFramework
  • Working in AmazonWebServices cloud computing environment
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, space-time.
  • Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R, Mahout, Hadoop,andMongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in theanalysis.
  • Performed ExploratoryDataAnalysis and DataVisualizations using R, andTableau.
  • Perform a proper EDA, Uni-variate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Worked with DataGovernance, Dataquality, datalineage, Dataarchitect to design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MSVisio.
  • As an Architect implemented MDM hub to provide clean, consistent data for anSOA implementation.
  • Developed, Implemented &Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for forwarding/ReverseEngineeredDatabases.
  • Established Dataarchitecture strategy, bestpractices, standards, and roadmaps.
  • Lead the development and presentation of a dataanalytics data-hub prototype with the help of the other members of the emerging solutions team
  • Performed datacleaning and imputation of missing values using R.
  • Worked with Hadoopeco system covering HDFS, HBase, YARN,andMapReduce
  • Take up ad-hoc requests based on different departments and locations
  • Used Hive to store the data and perform datacleaning steps for huge datasets.
  • Created dash boards and visualization on regular basis using ggplot2 and Tableau.
  • Creating customized business reports and sharing insights to the management.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.

Environment: Erwin r, Informatica, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, and Requisite Pro, Hadoop, PL/SQL, etc.

Confidential, Irvine, California

Data Scientist


  • Statistical Modeling with ML to bring Insights in Data under guidance of Principal Data Scientist
  • Data modeling with Pig, Hive, Impala.
  • Ingestion with Sqoop, Flume.
  • Used SVN to commit the Changes into the main EMM application trunk.
  • Understanding and implementation of text mining concepts, graph processing and semi-structured and unstructured data processing.
  • Worked with Ajax API calls to communicate with Hadoop through Impala Connection and SQL to render the required data through it.TheseAPI calls are similar to Microsoft Cognitive API calls.
  • Good grip on Cloudera and HDP ecosystem components.
  • Used ElasticSearch (Big Data) to retrieve data into theapplication as required.
  • Performed Map Reduce Programs those are running on the cluster.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in loading data from RDBMS and weblogs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to thedatabase.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Have hands-on experience working withSequence files, AVRO, HAR file formats and compression.
  • Used Hive to partition and bucket data.
  • Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured data.
  • Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Worked on improving theperformance of existing Pig and Hive Queries.

Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE,AWS.


Data Architect/Data Modeler


  • Worked with large amounts of structured and unstructured data.
  • Knowledge of Machine Learning concepts (Generalized Linear models, Regularization, Random Forest, Time Series models, etc.)
  • Worked in Business Intelligence tools and visualization tools such as Business Objects, Tableau, ChartIO, etc.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, and AJAX.
  • Configured the project on WebSphere 6.1 application servers
  • Implemented the online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL.
  • Handled end-to-end project from data discovery to model deployment.
  • Monitoring the automated loading processes.
  • Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC
  • Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
  • Used SAX and DOM parsers to parse the raw XML documents
  • Used RAD as Development IDE for web applications.
  • Preparing and executing Unit test cases
  • Used Log4J logging framework to write Log messages with various levels.
  • Involved in fixing bugs and minor enhancements to the front-end modules.
  • Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Doing functional and technical reviews
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Guaranteeing quality in the deliverables.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support
  • Created test plan documents for all back-end database modules
  • Implemented the project in Linux environment.

Environment: R, Erwin, Tableau, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.


Data Analyst/Data Modeler


  • Developed Internet traffic scoring platform for ad networks, advertisers,and publishers (rule engine, site scoring, keyword scoring, lift measurement, linkage analysis).
  • Responsible for defining the key identifiers for each mapping/interface.
  • Clients include eBay, Click Forensics, Cars.com, Turn.com, Microsoft, and Looksmart.
  • Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans.
  • Designed the architecture for one of the first analytics 3.0. Online platforms: all-purpose scoring, with on-demand, SaaS, API services. Currently under implementation.
  • Web crawling and text mining techniques to score referral domains, generate keyword taxonomies, and assess commercial value of bid keywords.
  • Developed new hybrid statistical and data mining technique known as hidden decision trees and hidden forests.
  • Reverse engineering of keyword pricing algorithms in the context of pay-per-click arbitrage.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Automated bidding for advertiser campaigns based either on keyword or category (run-of-site) bidding.
  • Creation of multimillion bid keyword lists using extensive web crawling. Identification of metrics to measure the quality of each list (yield or coverage, volume, and keyword average financial value).
  • Enterprise Metadata Library with any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.

Environment: Erwin r, SQL Server 2000/2005, Windows XP/NT/2000, Oracle, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.

Hire Now