We provide IT Staff Augmentation Services!

Data Scientist Resume

Chantilly, VA


  • Overall 8+ years’ experience in Data Analysis, Data Profiling, Data Integration, Migration, Data Governance and Metadata Management, Master Data Management and Configuration Management.
  • Extensive experience in various phases of software development like analyzing, gathering and designing the data with expertise in documenting.
  • Extensive experience in Text Analytics, developing different statistical Machine Learning, Data mining solutions to various business problems and gathering data visualization using python.
  • Experience in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across massive volume of structured and unstructured.
  • Extensive knowledge of Apache Hadoop technologies like Pig, Hive, Scoop, Spark, Flume and HBase.
  • Experience in Big data querying tools, such as Hive, Pig. Also good experience in python and R languages.
  • Extensive experience on building Analytic Dashboards using Tableau.
  • End to end experience in designing and deploying data visualizations using Tableau.
  • Good knowledge in using all complex data types in Pig and MapReduce for handling the data and formatting it as required.
  • Experience working on tools and technologies such as Tableau, Birst, Power BI, Microstrategy, Informatica, Cognos, Ab Initio.
  • Exposure to Python, R and Big Data and Hadoop technologies such as Spark, Pig, Hive, MapReduce.
  • Knowledge on various hadoop format files like parquet, JSON and using various compression technologies.
  • Exploring opportunities in data science, including deep machine learning, natural language processing, and artificial intelligence.
  • Experience in big data analysis and developing data models using Hive, PIG, and Map reduce, SQL with strong data architecting skills designing data - centric solutions.
  • Extensive Experience working on SQL Queries along with good experience in development of Confidential -SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
  • Working on AWS which includes Amazon Kinesis, Amazon simple storage service, Spark Streaming, PySpark and Spark SQL on top of an Amazon EMR cluster.
  • Wrote several SQL Scripts such as finding long running queries, blocking sessions, archiving data from production to archive server and populating reporting data.
  • Good knowledge Developing Informatica Mappings, Mapplets, Sessionss, Workflows and Worklets for data loads from various sources such as Oracle, Flat Files, DB2, SQL Server etc.
  • Experience in process improvement, Normalization/De-normalization, data extraction, data cleansing, and data manipulation.
  • Experience in data from various sources like Oracle Database, Flat Files, and CSV files and loaded to target warehouse.
  • Experience in Transform and Load data from heterogeneous data sources to SQL Server using SQL Server Integration Services (SSIS) Packages.
  • Extensive in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams using multiple data modeling tools like Erwin, ER Studio.
  • Experience on writing, testing and implementing of SQL queries using advance Analytical Functions.
  • Knowledge in writing SQL queries, and resolving key performance issues.
  • Good knowledge of Hadoop architecture and its components like HDFS, MapReduce, Job Tracker, Task Tracker, Name Node and Data Node.
  • Integration Architect & Data Scientist experience in Analytics, BigData, BPM, SOA, ETL and Cloud technologies.


Data Science Tools: Machine Learning, Deep Learning, Data Warehouse, Data Mining, Data Analysis, Big data, Visualizing, Data Munging, Data Modelling

Database: MySQL, Hive, Microsoft SQL Server 2014/2012/2008/2005 , Teradata, MS Access, PostgreSQL, Netezza, SQL Server, Oracle.

Analysis and Modeling Tools: Erwin r9.6/r9.5/r9.1/r8.x, Sybase Power Designer, Oracle Designer, BP win ER/Studio, .1, MS Access 2000, Star-Schema, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.

Reporting Tools: Business Objects, MS Excel Reports, MS Access Reports, Tableau reports, SSRS, SSIS, Crystal Reports

Operating Systems: Windows, Linux, Unix

Languages: SQL, Python (Pandas, SciPy, Sklearn, Matplotlib)

Machine Learning: Regression, Clustering, Random forest, SVM

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9., Ipython, Spyder, Spark

ETL Tools: SSIS, Pentaho, Informatica Power Center 9.7/9.6/9.5/9.1

Big Data: Hadoop, Map Reduce, HDFS 2, Hive, Pig, HBase, Sqoop, Spark.


Confidential, Chantilly, VA

Data Scientist


  • Gathered, documented, and implemented business requirements for analysis or as part of a long-term document/report generation. Analyzed large volumes of data and provide results to technical and managerial staffs.
  • Worked with various data pools and DBAs to have access to data. Have knowledge of NLP, NLTK or Text Mining.
  • Have programming knowledge in Scala, spark, Sql and python.
  • Used K-means clustering for grouping similar data and documented.
  • Extracted, transformed, and loaded data in Postgres data base using Python scripts.
  • Data visualization: Pentaho, Tableau, D3, Django web app. Have knowledge of Numerical optimization, Anomaly Detection and estimation, A/B testing, Statistics, and Maple. Have big data analysis technique using Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.
  • Worked to research and develop statistical learning models for data analysis. Collaborated with product management and engineering departments.
  • SAS Data Analysts is used for analyzing client business needs, managing large data sets, storing and extracting information. Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data.
  • Knowledge on microstrategy Schema objects such as facts, attributes, transformations and Hierachies as per the Technical Design Document.
  • Hadoop - MAP REDUCE/Hive/Pig/ was used to store, process and analyze huge amount of unstructured data having 100 million users to offer a gift card to its top 10 customers who have spent the most in the previous year. Kafka was used as message broker to collect large volume of data and to analyze the collected data in the distributed system.
  • Linux is used in shell scripting such as scala, numpy.
  • Used Linux to connect to AWS to store and retrieve the data form it.
  • Splunk ES was used for application management, security, performance management, and analytics for the public APIs.
  • Worked on data generation, machine learning for Anti-Fraud detection, Data Modeling, operation decision, and loss forecasting such as product-specific fraud, or buyer vs. seller fraud.
  • Used Linux to run all the data which in big data technologies.
  • Knowledge on desiginig and creating Microstrategy Reports, Documents, and interactive Dashboards.
  • Monte Carlo simulation algorithms were used to obtain numerical results by running simulations many times in succession in order to calculate probabilities with machine learning. Analyzed data for Fraud Analysis and Direct Fraud.
  • K-fold cross Validation technique was used to improved model performance and to test the model on the sample data before finalizing the model.
  • Worked with public/private Cloud Computing technologies (IaaS, PaaS & SaaS) and Amazon Web Services (AWS) and worked for customer analytics and predictions.
  • Kibana and Tableau was used for Business Intelligence tool for visually analyzing the data and to shows the trends, variations and density of the data in form of graphs and charts.
  • Formulated procedures for integration of R programming plans with data sources and delivery systems and R language was used for prediction.
  • Used query languages such as SQL, Hive, Pig and experience with NoSQL databases, such as MongoDB, Cassandra, HBase.
  • Big data Analysis: Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.
  • Worked with both unstructured/structured data Machine Learning Algorithms such as Linear, Logistic, Decision Tress, Random Forests, Support Vector Machines, Neural Networks, KNN, and Time series analysis.
  • Tableau was used for analysing the data to show the trends, variations and density of the data in form of graphs and charts. Tableau was connected to files, relational and big data sources to acquire and process data.
  • Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
  • Created and executed complex SQL statements in both SQL production and development environments.

Confidential, st Louis, MO.

Data analyst


  • Implemented various Machine learning algorithms - Linear Regression, Logistic Regression, Decision Tree, SVM, Naive Bayes, KNN, K-Means, Random Forest, and Gradient Boost & Adaboost on UCI Machine Learning Repository.
  • Built data pipeline framework using python for data extraction, data wrangling and data loading in Oracle SQL and Apache HDFS using - Pig and Hive.
  • Involved in Data analysis for data conversion - included data mapping from source to target database schemas, specification and writing data extract scripts/programming of data conversion, in test and production environments.
  • Data Warehouse - Designed and programmed ETL and aggregation of data in target database, working with staging, de-normalized and start schemas and dimensional reporting.
  • Developed business predictive/historic analysis, Data Mining/Text Mining using Python with pandas and R Studio.
  • Integrated new tools and developed technology frameworks/prototypes to accelerate the data integration process and empower the deployment of predictive analytics by developing Spark Scala modules with R.
  • Developed and implemented Predictive analysis using R for Management and Business users for decisions making process.
  • Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
  • Developed Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
  • Performing statistical data analysis and data visualization using Python and R
  • Worked on creating filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Interacting with other data scientists and architected custom solutions for data visualization using tools like tableau, Packages in R and R-Shiny.
  • Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
  • Experience in performing Tableau administering by using tableau admin commands.
  • Involved in running Map Reduce jobs for processing millions of records.
  • Responsible for Data Modeling as per our requirement in HBase and for managing and scheduling Jobs on a Hadoop cluster using Oozie jobs.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
  • Design and development of ETL processes using Informatica ETL tool for dimension and fact file creation.
  • Responsible for business case analysis, requirements gathering use case documentation, prioritization and product/portfolio strategic roadmap planning, high level design and data model.
  • Responsible for the end to end solutions delivery, including sprint planning and execution, change management, project management, operations management, and UAT.

Confidential, Thousand oaks, CA

Data Analyst


  • Worked with business requirements analysts/subject matter experts to identify and understandrequirements. Conducted user interviews anddataanalysis review meetings.
  • Defined key facts and dimensions necessary to support the business requirements along withData Modeler.
  • Created draftdatamodels for understanding and to helpDataModeler.
  • Resolved thedatarelated issues such as: assessingdataquality,dataconsolidation, evaluatingexistingdatasources.
  • Manipulating, cleansing & processingdatausing Excel, Access and SQL.
  • Enhancing and deploying the SSIS Packages from development server to production server.
  • Responsible for loading, extracting and validation of client data.
  • Extensively involved in designing the SSIS packages to export data of flat file source to SQL server database.
  • Coordinated with the front end design team to provide them with the necessary stored procedures and packages and the necessary insight into thedata.
  • Participated in requirements definition, analysis and the design of logical and physicaldatamodels
  • Leadingdatadiscovery discussions with Business in JAD sessions and map the business requirements to logical and physical modeling solutions.
  • Conducteddatamodel reviews with project team memberscaptured technical metadata throughdatamodeling tools
  • Code standard Informatica ETL routinesdeveloped standard Cognos Reports.
  • Collaborated with ETL teams to createdatalanding and staging structures as well as source to target mapping documents
  • Ensuredatawarehouse database designs efficiently support BI and end user requirements.
  • Collaborated with application and services teams to design databases and interfaces which fully meet business and technical requirements
  • Maintain expertise and proficiency in the various application areas.
  • Maintain current knowledge of industry trends and standards.


Tableau Developer


  • Responsible for requirement gathering from business and prepared the requirement specification document and mapping document.
  • Designed and developed Tableau reports to ensure all the business requirements are met and provide enhanced Solution that meets client needs.
  • Performance optimization of dashboards with huge volume of data.
  • Provide mockup dashboards to business.
  • Extracting data from different sources connecting them and used the same to create dashboard.
  • Created data blending in case of merging two different datasets.
  • Reviewed basic SQL queries and edited inner, left, and right joins in Tableau Desktop by connecting live/dynamic and extracts.
  • Preparing Dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies in Tableau.
  • Worked with Groups, hierarchies, Sets to create detail level summary report and Dashboard using KPI's.
  • Worked extensively with Advance analysis Actions, Calculations, Parameters, Background images, Maps, Trend Lines, Statistics, and Log Axes.
  • Generate dashboards to provide trending of data using statistics and KPI’s.
  • Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server, which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
  • Combined views and reports into interactive dashboards in Tableau Desktop that were presented to Business Users, Program Managers, and End Users.
  • Integrated Cognos reports with Tableau for more detailed reporting, using URL actions, and passing filter values from Tableau to Cognos.
  • Extensively used Tab admin and Tabcmd commands in creating backups and restoring backups of Tableau repository.
  • Refresh and publish the data and dashboards to the Tableau server. Scheduling the data source refreshes using windows batch file with Tabcmd scripts.
  • Defined best practices for Tableau report development.
  • Worked on creation of users, groups, projects, workbooks and the appropriate permission sets for Tableau server logons and security checks.
  • Provided 24/7 production support for Tableau users.


SQL Developer


  • Coding based on the technical design and following coding standards.
  • Performance tuning of complex time consuming queries
  • Developing automation tools to reduce manual effort there by reducing the cost of resources involved.
  • Providing fixes to issues by doing a thorough root cause analysis and impact of the defect.
  • Build and Release management.
  • Calculate the Query plan of stored procedure and made the necessary changes to the stored procedure.
  • Providing solutions to the developers for the tuning of stored procedures according to the coding guidelines provided by Organization and Microsoft.
  • Customized the applications which should comply with Organizations own standard
  • Responsible for the design and development of builds, SQL stored procedures, hosting the application including source code control and issue tracking.
  • Developed Tables, Stored procedures, Views, Functions and Triggers to perform automated rules, updating to related tables using SQL Server 2005.
  • Involved in GUI design for the project using Master and Content Pages along with User Controls.
  • Worked on Grid Views to Edit/Display/Modify the cases information.
  • Application will track the code which violates the coding standards.

Hire Now