We provide IT Staff Augmentation Services!

Data Scientist Resume

Reston, VA


  • Over 8+ years of hands - on experience and comprehensive industry knowledge of Machine Learning Statistic Modeling, Predictive Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining and Natural Language Processing (NLP), Artificial Intelligence algorithms.
  • Experienced in utilizing analytical applications like R, SPSS, and Python to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis, Sentiment Analysis and Predictive Modeling.
  • Experience building solutions for enterprises, context-awareness, pervasive computing, and/or application of machine learning.
  • Experienced in building data models using machine learning techniques for Classification, Regression, Clustering and Associative mining.
  • Excellent Tableau Developer, expertise in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau.
  • Experienced in Data Modeling retaining concepts of RDBMS, Logical and Physical Data Modeling until 3NormalForm (3NF) and Multidimensional Data Modeling Schema (Star schema, Snow-Flake Modeling, Facts, and dimensions).
  • Imported the data from different sources like AWS S3, Local file system into Spark RDD.
  • Very good hands-on in Spark Core, Spark Sql, Spark Streaming and Spark machine learning using Scala and Python programming languages.
  • Solid Understanding of RDD Operations in Apache Spark i.e. Transformations & Actions, Persistence (Caching), Accumulators, Broadcast Variables.
  • Experience with developing and maintaining Applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Formation
  • Exploring the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Experience in submitting Apache Spark job and map reduce jobs to YARN.
  • Worked with different flavors of Hadoop distributions which includes Cloud era and Horton works.
  • Good experience in working with cloud environment like Amazon Web Services (AWS) EMR, EC2, and S3.
  • Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
  • Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
  • Experienced in Database using Oracle, XML, DB2, Teradata, Netezza, SQL server and NoSQL.
  • Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2 and SQL Server databases.
  • Experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Informatica Power Center.
  • Experienced in SAS/BASE, SAS/STAT, SAS/SQL, SAS/MACROS, SAS/GRAPH, SAS/ACCESS, SAS/ODS, SAS/QC, SAS/ETS in Mainframe, Windows and UNIX environments.
  • Experienced in Database performance tuning and Data Access optimization, writing complex SQL queries and PL/SQL blocks like stored procedures, Functions, Triggers, Cursors and ETL packages.
  • Experience in managing code on Github.


Programming & Scripting Languages: Python, R, Java, Scala, HTML, CSS, Java Script

Databases: MS-Access, Oracle 12c/11g/10g/9i, and Teradata, Hadoop, PostgreSQL.

Statistical Software: SPSS, R, SAS.

ETL/BI Tools: Informatica PowerCenter 9.x, Tableau, Cognos BI 10, MS Excel, SAS, SAS/Macro, SAS/SQL

Web Packages: Google Analytics, Adobe Test & Target, Web Trends

BigData Ecosystem: HDFS, SPARK, PIG, MapReduce, HIVE, SQOOP, FLUME, HBase, Storm, Kafka, Elastic Search, Redis.

Statistical Methods: Time Series, regression models, splines, confidence intervals, principal component analysis and Dimensionality Reduction, bootstrapping

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball & Inmon Methodologies

Cloud: AWS, S3, EC2.

Big Data / Grid Technologies: Cassandra, Coherence, Mongo DB, Zookeeper, Titan, Elastic search, Storm, Kafka, Hadoop


Data Scientist

Confidential -Reston, VA


  • Perform Data Profiling to learn about user behavior and merge data from multiple data sources.
  • Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoop ecosystems such as PIG, and HIVE.
  • Designing and developing various machine learning frameworks using Python and Scala.
  • Programmed a utility in Python that used multiple packages (SciPy, NumPy, pandas).
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Solution architecting BIG Data solution for Projects & Proposal using Hadoop, Spark, ELK Stack, Kafka, Tensor flow.
  • Worked on Clustering and classification of data using machine learning algorithms. Used Tensor Flow machine learning to create sentimentally and time series analysis.
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Created end to end pipeline model training, testing and serving.
  • Develop documents and dashboards of predictions in MicroStrategy and present it to the Business Intelligence team.
  • Used Cloud Vision API integrate vision to detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.
  • Implemented Text mining to transposing words and phrases in unstructured data into numerical values
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and BigData.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Utilized human face recognition OpenCV and tackled the challenge of long running time on personal computer for face
  • Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Handled importing data from various data sources, performed transformations using Hive and loaded data into HDFS.
  • Collaborate with data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
  • Collect unstructured data from MongoDB 3.3 and completed data aggregation.
  • Storing training features are stored in AWS S3 .
  • Validating and publishing built model into s3.
  • TensorFlow models trained with GPU instances in AWS using batch of training samples from S3.
  • Serving of TensorFlow models using TensorFlow Serving.
  • Created Docker image, created a container on that image and run all its dependencies into a package, configure it and deploy.
  • Ability to build ML pipelines in Python or Scala in Spark.
  • Perform data visualization with Tableau 10 and generate dashboards to present the findings.
  • Recommend and evaluate marketing approaches based on quality analytics of customer consuming behavior.
  • Determine customer satisfaction and help enhance customer experience using NLP.
  • Work on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.

Environment: Python, Spark (MLlib, PySpark), Tableau, MicroStrategy, Tensor Flow, AWS S3 MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning, regression, logistic regression, Hadoop 2.7, OLTP, random forest, OLAP, HDFS, ODS, NLTK, JSON, XML.

Data Scientist

Confidential, Kansas City, MO


  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, &KNN for data analysis.
  • Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data lifecycle management in both RDBMS, Big Data environments.
  • Technical stack is Python 2.7/PyCharm/Anaconda/pandas/numPy/R/Oracle.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Developed MapReduce Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Performed Source System Analysis, database design, data modeling for the warehouse layer using MLDM concepts and package layer using Dimensional modeling.
  • Developed Linux Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Created SSIS packages (ETL) to migrate data from heterogeneous sources such as MS Excel, Flat Files, CSV files.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Transforming and merging all the weekly client data into yearly file using ETL SSIS
  • Used Visual Team Foundation server for version control, source control and reporting.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Hands-on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi-Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized MapReduce, Hadoop, Kafka, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction.
  • Analysed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modelling techniques.

Environment: Python, SQL, Oracle 12c, Netezza, SQL Server, Informatica, Erwin r9.6, SSRS, PL/SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Cassandra, MapReduce, Kafka, Hadoop, Hive, Teradata, random forest, OLAP, HDFS, Tableau, XML, AWS.

BigData/ Hadoop Developer

Confidential - Oklahoma City, OK


  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Responsible for installing, configuring, supporting, and managing of Hadoop Clusters.
  • Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
  • Installed and configured Pig and written Pig Latin scripts.
  • Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Created HBase tables and column families to store the user event data.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Developed a data pipeline using HBase, Spark, and Hive to ingest, transform and analyzing customer behavioral data.
  • Experience in collecting the log data from different sources like (webservers and social media) using Flume and storing on HDFS to perform MapReduce jobs.
  • Handled importing of data from machine logs using Flume.
  • Created Hive Tables, loaded data from Teradata using Sqoop.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Configured, monitored, and optimized Flume agent to capture weblogs from the VPN server to be put into Hadoop Data Lake.
  • Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
  • Wrote, tested and implemented Teradata Fast load, Multiload and BTEQ scripts, DML and DDL.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala, and Python.
  • Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
  • Develop ETL Process using SPARK, SCALA, HIVE, and HBase.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Wrote Java code to format XML documents; upload them to Solr server for indexing.
  • Used with NoSQL technology (Amazon DynamoDB) to gather and track event-based metric.
  • Maintenance of all the services in Hadoop ecosystem using ZOOKEEPER.
  • Worked on implementing Spark framework.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
  • Experienced in loading and transforming of large sets of structured, semi and unstructured data.
  • Help design of scalable Big Data clusters and solutions.
  • Followed agile methodology for the entire project.
  • Experience in working with Hadoop clusters using Cloud era distributions.
  • Involved in Hadoop cluster tasks like Adding and Removing Nodes without any effect to running jobs and data.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Converting the existing relational database model to Hadoop ecosystem.

Environment: Hadoop, HDFS, Pig, Hive, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic search, Linux- Ubuntu, Flume, Cloud era.

BigData Engineer

Confidential - Richmond, VA


  • Developed data pipeline using Spark, Hive,and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations on the fly to build the common learner data model and persists the data in HDFS.
  • Hands on experience in designing, developing, and maintaining software solutions in Hadoop cluster.
  • Exploring the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Spark Yarn.
  • Experienced with Spark Streaming to ingest data into Spark Engine.
  • Designed the ETL runs performance tracking sheet in different phases of the project and shared with the Production team.
  • Imported the data from different sources like AWS S3, Local file system into Spark RDD.
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala.
  • Involved in using SQOOP for importing and exporting data between RDBMS and HDFS.
  • Used Hive to analyze the Partitioned and Bucketed data and compute various metrics for reporting.
  • Involved in developing HiveDDLS to create, alter and drop Hive tables.
  • Involved in loading data from Linux file system to HDFS.
  • Integration of Cassandra with Talend and automation of jobs.
  • Worked on cloud computing infrastructure (e.g. Amazon Web Services EC2) and considerations for scalable, distributed systems
  • Involved in data warehousing and Business Intelligence systems.
  • Responsible for System performance management, Systems change / configuration management and Business requirements management.
  • Worked on converting the multiple SQL Server and Oracle stored procedures into Hadoop using Spark SQL, Hive, Scala, and Java.
  • Extensively used AWS services S3 for storing data and EMR for resource-intensive jobs.
  • Built and managed on-demand AWS Clusters using Qubole to process the daily web feeds.
  • Used HTML5, CSS3, JDBC Driver, JSP, AJAX, Google API and Web mashup.
  • The primarycontributor to designing, coding, testing, debugging, documenting and supporting all types of applications consistent with established specifications and business requirements to deliver business value. Identify and design most efficient and cost-effective solution through research and evaluation of alternatives.
  • Demonstrated Hadoop practices and broad knowledge of technical solutions, design patterns, and code for medium/complex applications deployed in Hadoop production.
  • Ingested semi-structured data using Flume and transformed it using Pig.


Java Developer



  • Involved in various phases of Software Development Lifecycle (SDLC) of the application like requirement gathering, Design, Analysis and code development.
  • Followed Agile Scrum methodology, involved in sprint planning, retros and code reviews.
  • Effectively involved in developing the logic to implement the requirements
  • Implemented MVC architecture using Spring MVC.
  • Involved in developing the class diagrams and sequence diagrams.
  • Involved in designing and developing the rich internet application using JSP, JavaScript, CSS,and HTML.
  • Actively involved in developing Servlet classes and unit testing.
  • Involved in writing Spring Configuration XML file that contains declarations and another dependent object declaration
  • Involved in the JMS- queue configurations which are used to connect to the back-end systems.
  • Developed complete Business tier with Stateless, Stateful Session beans and Session Beans with JPA with EJB standards.
  • Used JMS API for asynchronous communication by putting the messages in the Message queue.
  • Applied the Session Façade, Data Access Object, Data Transfer Object design patterns.
  • Developed and maintained User authentication and authorization by employing EJB2.0.
  • Created the new web services (soap over HTTP) and exposed to front-end layer to consume it.
  • Tested the web services which are created using SOAP UI.
  • Created Restful web services interface to Java-based runtime engine.
  • Worked on PostgreSQL like interacting with Database, writing Stored Procedures and debug and fix the issues as well.
  • Analysed and fixed the defects raised in all testing phase (SIT, UAT and performance testing).
  • Responsible for setup, installation of WebSphere Application Server on UNIX and Linux platforms in Test, DEV and PROD environments.
  • Configured WebSphere admin console.
  • Configured IBM HTTP Web server to work with WAS.
  • Created SQL views, queries, functions and triggers to be used to fetch data from the system.
  • Used Hibernate as the ORM tool to develop the persistence layer
  • Implemented JDBC specification to connect to the database.
  • Utilized Java debugging and error handling classes and techniques to troubleshoot and debug issues.
  • Used log4J for error reporting and debugging.
  • Worked with JUNIT extensively to define various Test Suites and Test Cases.

Environment: Windows NT, WebSphere, Design Patterns, JDK, Spring, Hibernate, JSP, EJB2.0, XML, jQuery, HTML, and Eclipse, JMS, Web Services(SOAP over HTTP), Springs, Oracle 10g, Maven, Hibernate, JUnit, Log4j.

Java/J2EE Developer



  • Worked on detailed design and coding.
  • Implemented the validation, error handling and caching framework with Oracle Coherence cache.
  • Developed the interactive user interface using jQuery, JavaScript library.
  • Worked on developing the GUI using HTML, CSS, and JavaScript (jQuery).
  • Worked on implementing Webpages on the website using Code Igniter framework.
  • Used Hibernate for database connectivity.
  • Utilized CSS, Ajax, jQuery and MySQL queries for website design and development.
  • Developed SQL scripts for data migration.
  • Worked with technologies such as jQuery and Ajax to make the website more attractive and user-friendly.
  • Gathered business requirements and prepared Software Requirement Specification (SRS) document. Created Visio charts for the workflow architecture of the system.
  • Collaborated with one team member in design, analysis, coding, testing, and website review.
  • Used iBatis framework with Spring framework for data persistence and transaction management.
  • Used Team Studio and Build Manager tools to develop applications and promote the new design to test environment.
  • Coordinated with the business users on the User Acceptance Tests (UAT) and to get the approval from a business on the design changes.

Environment: CSS, HTML, XHTML, JavaScript, Java, Photoshop, Illustrator, Fireworks, Cold Fusion, Adobe Contribute, Windows XP.

Hire Now