We provide IT Staff Augmentation Services!

Data Scientist Resume



  • Over 8+ years of Experience on Machine Learning, Statistical Modelling, Predictive Modelling, Data Analytics, Data Modelling, Data Architecture, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP)
  • Proficient in gathering and analyzing the Business Requirements with experience in documenting System Requirement Specifications (SRS) and Functional Requirement Specifications (FRS).
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, DataMining solutions to various business problems and generating data visualizations using R, Python,andTableau.
  • Experience in data mining using both R and SAS to further perform statistical tests (hypothesis testing) on the mined data.
  • Experience in Multiple Linear Regression based analysis using R understands the importance of independent variable over dependent.
  • Experience in agile and waterfall project management methodologies to produce high - quality deliverables that meet or exceed timeline and budgetary targets.
  • Experience in performing Annova to test the difference among group means of categorical variable in retail industry.
  • Performed market research and segmentation analysis using R, SAS and Excel to get the potential target customers and identify the market trend and opportunities.
  • Familiarity with R data processing functions like stack, merge, reshape, substr, ect., and plots like barplot, boxplot, ggplot, qplot, etc.
  • Good working experience on different file formats (PARQUET, TEXTFILE, AVRO, ORC) and different compression codec’s(GZIP, SNAPPY, LZO).
  • Utilized R Studio to apply regression and classification methods (such as Random Forest) to predict the popularity; evaluated the performance of different models and explored the influential rank of factors for customer survey.
  • Knowledge in calculating and understanding the time complexities of different algorithm used in machine learning
  • Experience with machine learning algorithms, such as KNN (K nearest neighbor), Random Forest, Decision tree, Factor analysis, etc.
  • Hands on experience in data visualization and desktop building tool like Tableau.
  • Expertise in using various SAS report generating procedures like PROC REPORT, PROC SQL, PROC FREQ, PROC MEANS, PROC TABULATE, PROC TRANSPOSE and PROC PRINT.
  • Experience in handling client meetings for updating current projects and to discuss on further projects.
  • Have strong experience on working with Data which includes various Healthcare claims and Membership.
  • Experience in Base SAS, SAS Enterprise Guide, enterprise miner, SAS/AML, SAS/Stat, SAS Macros, SAS/Graph, SAS/Connect, and SAS/Access on Windows and Unix platforms and competence in carrying out statistical analysis of large-scale data.
  • Modified existing SAS programs and created new SAS programs using SAS Macros to improve ease and speed of modification as well as consistency of results.
  • Quality assurance procedures of other programmers' work, validating, debugging, documenting and optimizing SAS programs for QC and QA.
  • Experience with export SAS Results to different formats such as XML, PDF, Excel using SAS/Export, SAS/ODS for reporting and presentation.


Hadoop Ecosystem: Hadoop, HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Oozie, Zena. Zeke Scheduling, Zookeeper, Flume, Kafka, Spark core, Spark Sql, Spark streaming, AWS, Azure Data lake.

NoSQL Databases: Hbase, Cassandra, MongoDB.

Build Management Tools: Maven, Apache Ant.

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans.

Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting.

Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, JUnit, MR-Unit.

Version control: Github, Jenkins.

IDE and Tools: Eclipse 4.6, Netbeans 8.2, BlueJ.

Databases: Oracle 12c/11g, Microsoft SQL Server2016/2014, DB2 & MySQL 4.x/5.x

Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile, STLC (Software Testing Life cycle), UML, Design Patterns (Core Java and J2EE).

Web Technologies: HTML5/4, DHTML, AJAX, JavaScript, jQuery and CSS3/2, JSP, Bootstrap 3/3.5.


Confidential , KS

Data Scientist


  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation.
  • Created Hive Tables, loaded transactional data from Teradata using Sqoop and Worked with highly unstructured and semi structured data of 2 Petabytes in size
  • Integrated ApacheStorm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Responsible for developing data pipeline with AmazonAWS to extract the data from weblogs and store in HDFS.
  • Created various Documents such as Source-To-Target DatamappingDocument, UnitTest, Cases and DataMigrationDocument.
  • Imported data from structured data source into HDFS using Sqoop incremental imports.
  • Performed data synchronization between EC2 and S3, Hive stand-up, and AWS profiling.
  • Created Hivetables, partitions and implemented incremental imports to perform ad-hoc queries on structured data.
  • Improving the performance and optimization of existing algorithms in Hadoop using Sparkcontext, Spark-SQL and Spark YARN.
  • Supporting dataanalysis projects by using Elastic MapReduce on the Amazon Web Services(AWS) cloud performed Export and import of data into s3.
  • Worked on analyzing different big data analytic tools including Hive, Impala and Sqoop in importing data from RDBMS to HDFS.
  • Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS.
  • Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
  • Supported MapReduce Programs those are running on the cluster and also wrote MapReduce jobs using Java API.
  • Wrote complex SQL and PL/SQLqueries for stored procedures.
  • Used S3 Bucket to store the jar's, input datasets and used Dynamo DB to store the processed output from the input data set.
  • Created MapReduce running over HDFS for data mining and analysis using R and Loading & Storage data to Pig Script and R for MapReduce operations.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developing data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Worked on Mongo DB, HBase (NoSQL) databases which differ from classic relational databases
  • Involved in converting Hive QL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on ApacheFlume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts
  • Used Singleton, DAO, DTO, Session Facade, MVC design Patterns.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.

Environment: Agile, Hive, Teradata, Sqoop, Storm, Kafka, HDFS, AWS, Data mapping, EC2, S3, Hadoop, YARN, MapReduce, RDBMS, Data Lake, Python, Scala, Dynamo DB, Flume, Pig, MongoDB, MVC.

Confidential , NJ

Data Scientist


  • Played a role of DataScientist, working directly with BusinessPartner on Datadesign and sourcing to be able to leverage existing BI capabilities to scale applications for future advance analytics.
  • Evaluation of machine learning algorithms and data usage for scoring models and classification.
  • Built models using Statistical techniques like BayesianHMM and MachineLearning classification models like XG Boost, SVM, and Random Forest using Python packages.
  • Understanding business models and select best approaches to improve their performance. Also, analyzing data for trends.
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
  • Working in Business and Data Analysis, Data Profiling, Data Migration, Data Integration and Metadata Management Services.
  • Business Analysis, Requirement Gathering, Functional and Architecture for Credit Risk Rating 3.
  • Implemented data collection, curation, and analysis scripts using Hadoop, PIG, HIVE technologies.
  • Developed NLP service to identify and extract text features for pre-populating fields in the client's data reporting and abstraction application.
  • Developed needs-based segmentation that aided management in gaining a deeper understanding of consumer behavior, these segments assisted management in development and marketing of services.
  • Applied advanced machine learning algorithms including PCA, K-nearest neighbours, random forest, gradient boosting, neural network and xgboost to predict weights and labels of Higgs Boson with high accuracy.
  • Developed distributed data processing applications to automate data cleaning and normalization process using HadoopStack.
  • Performed scoring and financial forecasting for collection priorities using Python and SAS machine learning algorithms.
  • Used Natural Language Processing (NLP) for response modeling and fraud detection efforts for credit cards.
  • Used Python, R and Spark to develop variety of models and algorithms for analytic purposes.
  • Designed and build large and complex data sets, from spurious sources while thinking strategically about uses of data and how data use interacts with data design.
  • Source system analysis, data analysis, analysis of integration between modules, analysis of business sense of source entities and relationships.
  • Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.

Environment: Python 3.6, Apache Spark and Kibana, IPython, Hadoop, Kafka, PIG, HIVE, MLib, Scikit-Learn, MySQL, SQL, Data Warehouse, Data Modeling, Middleware Integration, Gradient Boost, Random Forest, xgboost, OpenCV, sklearn etc.

Confidential, Irvine CA

Data Scientist


  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest using R and Python packages.
  • Used Python based data manipulation and visualization tools such as Pandas, Matplotlib, Seaborn to clean corrupted data before generating business requested reports
  • Developed parsing algorithms to clean and distribute large amounts of data. Regularly worked with datasets containing millions of records.
  • Developed personalized products recommendation with Machine Learning algorithms, including Collaborative filtering and Gradient Boosting Tree, to better meet the needs of existing customers and acquire new customers.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Decision Trees, Random Forests, Linear and Logistic Regression, SVM, Clustering, neural networks.
  • Highly experienced and knowledgeable in developing analytics and statistical models as per organizational requirements and ability to produce alternate cost effective and efficient models.
  • Extensive working experience with Python including Scikit-learn, Pandas and Numpy.
  • Well experienced in Normalization &De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors.
  • Prepared scripts to ensure proper data access, manipulation and reporting functions with python programming language.
  • Identified, analyzed and interpreted trends or patterns in complex data sets using data mining tools.
  • Determined customer satisfaction and helped enhance customer experience using NLP.

Environment: Python (Numpy, Pandas, PySpark, Scikit-learn, MatplotLib, NLTK), TSQL, MS SQL Server, Data Lineage, XML, R Studio, Spyder, MATLAB, ETL, Machine Learning, Shiny, h2o, Oracle, Teradata, Java, Tableau.


Data Scientist


  • Analyse Market Research Survey Data conducted by Cambridge Research and help building a marketing and sales tool.
  • Performed market research and segmentation analysis using SAS, SQL and Excel to get the potential target customers and identify the market trend and opportunities.
  • Extensively worked on developing the questioner consisting of 132 questions to understand the customer behaviours and the product penetration.
  • Worked on R Studio plotting the graphs based on survey data collected and also used SAS Text miner to understand the customer behaviour.
  • Using Text miner on a survey data was altogether a new task for Kellogg’s and the output was much useful to proceed further to implement text miner for other areas.
  • Perform multiple machine learning algorithms like KNN (K nearest neighbour), Random Forest, Decision tree, Factor analysis to check new business opportunities and to understand the data we receive to help the marketing and sales team.
  • Worked on a special case project to generate special set of samples (Slice Sampling, Hasting algorithm) form the survey using MCMC (Monte Carlo Markov Chain) in R Studio.
  • Performed advanced querying using SAS Enterprise Guide, calculating computed columns, using filter, manipulate and prepare data for Reporting, Graphing, and Summarization, statistical analysis, finally generating SAS datasets.
  • Assigned tasks to the team members after explaining them the business requirement and helping then understand the code logic.
  • I was also a part of Frozen food data, where I used R Studio and SAS EG to analyse multiple categories of frozen foods based on the weights assigned to each category.
  • Keeping track of project by arranging a weekly meeting with the Global Insight managers and present the weekly report on Tableau dashboard to give them a graphical view of customer segmentation.
  • Created complex and reusable Macros and used SAS and R functions for Data Cleaning, Validation, Analysis and Report generation as the data has 24,000 variables.
  • Extensively used various Base SAS procedures like proc SQL, proc Report, Proc Format, Proc Tabulate, Proc Print, Proc Sort etc. for reporting purposes Proc Freq, Proc Means, Proc reg.
  • Participated in meetings with the Insight Managers of 3 countries (UK, Australia and Canada) to understand the requirement and help them in building the report using SAS Enterprise miner and EG.
  • Performed cluster analysis on grouping the customers based upon 56 variables using K-mean cluster analysis.
  • Responsible for handling requests from the marketing department and inquiries from the Customer Services related to statistics of existing data.
  • As a lead, trained the tech team of Canada, Australia and UK to make then understand the business logic and make sure the reports of the all the 3 regions are in synchronous.
  • The output consisted of 30 individual reports and were replicated on Tableau dash boards
  • Used parallel processing in SAS to aggregate programs to significantly reduce the execution time of the applications that fetch aggregated and partitioned data.

Environment: SQL SAS EG, SAS Enterprise Miner, R Studio, SAS BI, Tableau, SAS EM, SAS DI, Microsoft SQL, IBM Informix, visual analytics, Putty, Teradata, Oracle, MS office.


Data Modeller


  • Experience with analysis of Claims Data, Membership Data, and Sales Rep data in healthcare industries.
  • Attended the business classes on fundamental of Clarity systems, INPATIENT, OUTPATIENT, Ambulatory &EPIC systems.
  • Design and select appropriate SAS procedures for each statistical analysis, test-run SAS program on mock data to ensure smooth analysis implementation.
  • Responsible for developing and submitting JCL jobs in Mainframe to complete various Weekly, monthly and quarterly processes.
  • Implementation of end-to-end ETL Process for IBM mainframe Operating System.
  • Lead and managed Health Risk Assessment project which electronically fed data files from vendor.
  • Analyze the Add & Term reports of each region using SAS tools like PROC FREQ, proctabulate , proc report , Proc mean, Proc transpose .
  • Creating PHI (Personal Health Information) reports in HTML, PDF and RTF formats using SAS/ODS facility
  • Used SAS VA, SAS DI to represent the categorization of patients based on the Add & Term results
  • Involved in Gathering and Analyzing Business Requirements and Detail Technical Design of ETL.
  • Produce reports to identify potential patients that meet the requirement for defined parameter.
  • Produce and manage annual hospital performance review packets. The packets will include membership, revenue, and cost analysis for the hospital.
  • Maintain relationship with around 1500 suppliers so as to know the current market scenario and to help our client with the best possible supplier.
  • Used Dynamic Data Exchange (DDE) feature of SAS for importing and exporting of data from and into SAS, MS Access and Excel.
  • Develop an understanding of relevant business processes, goals, and strategies for each individual healthcare client in order to provide comprehensive data analysis.
  • Attend weekly meetings with the client to update the progress and to recommend required changes.
  • Integrated analytic code written in Base/SAS, SQL, and SAS/Stat to create a point-n-click reporting environment.
  • Maintain a healthy bridge between my team and client as a we never missed any deliverable nor had any escalations.
  • Independently digging through the root cause of problems in Data, backend tables, databases, SQL code or SAS programs and describing the nature of technical issue.
  • Provide proper validation, including testing and documentation (e.g., requirements document, program validation), in accordance with company standards.

Environment: SAS Base, SAS/SQL, SAS/STAT, Reflection, Epic Systems, Web Portal, IBM Informix, Teradata SQL Assist, ETL, Oracle SQL developer, MS office, UNIX, Windows, SAS/Access Oracle, PUTTY.


Data Analyst/Data modeller


  • Assists in maintaining the company’s credit facilities and relationships including credit renewals, refinancing, future planning, and fulfilling all reporting and compliance requirements.
  • Independently create, execute, maintain, and validate programs that transfer data across multiple data management systems or operating systems.
  • Worked on survey data analysis to understand the customer segmentation across regions
  • Coding SAS/SQL programs with the use of SAS/BASE and SAS/Macros for ad hoc jobs.
  • Performed market research client on loan request information in commercial lending, BSA/Anti-Money Laundering (AML) Administration and third-party applications.
  • Design and develop stored procedures, functions, views and triggers to be used in the ETL process
  • Assists in long-term planning and budgeting processes, especially in the areas of cash and credit functions.
  • Built summary reports after identifying the customers, their occupancy period and the revenue generated using PROC SUMMARY, PROC MEANS and PROC FREQ.
  • Used SAS VA for designing end report in more creative way to the client by using all the features like network diagram, information maps, etc.
  • Analysis included Behavioural model, Customer Profiling, Segmentation, Trend Analysis, and Predictive Modelling, Optimizing open-ended offers, etc.
  • Created and Automated Daily, weekly and monthly analysis on the sales of all product categories
  • Perform Data analysis, statistical analysis; generate safety and efficacy tables, listings and graphs using Base SAS, SAS Macros.

Environment: SAS BI, Windows, web portals, SAS visual analytics, Teradata, MS office, Tableau, Oracle Sql developer.

Hire Now