We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Chantilly, VA

PROFESSIONAL SUMMARY:

  • Overall 8+ years’ experience in Data Analysis, Data Profiling, Data Integration, Migration, Data Governance and Metadata Management, Master Data Management and Configuration Management.
  • Extensive experience in various phases of software development like analyzing, gathering and designing the data with expertise in documenting.
  • Extensive experience in Text Analytics, developing different statistical Machine Learning, Data mining solutions to various business problems and gathering data visualization using python.
  • Experience in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across massive volume of structured and unstructured.
  • Extensive knowledge of Apache Hadoop technologies like Pig, Hive, Scoop, Spark, Flume and HBase.
  • Experience in Tableau Desktop, Tableau Server, Tableau Reader and Tableau Public in versions 8.x and 9.0.
  • Extensive experience on building Analytic Dashboards using Tableau.
  • End to end experience in designing and deploying data visualizations using Tableau.
  • Experience in integration of various relational and non - relational sources such as DB2, Teradata, Oracle, Netezza, SQL Server, NoSQL, COBOL, XML and Flat Files, to Netezza database.
  • Good knowledge in using all complex data types in Pig and MapReduce for handling the data and formatting it as required.
  • Experience working on tools and technologies such as Tableau, Birst, Power BI, Microstrategy, Informatica, Cognos, Ab Initio.
  • Exposure to Python, R and Big Data and Hadoop technologies such as Spark, Pig, Hive, MapReduce.
  • Exploring opportunities in data science, including deep machine learning, natural language processing, and artificial intelligence.
  • Experience in designing Star Schema, Snowflake schema for Data Warehouse, by using tools like Erwin data modeler, Power Designer and Embarcadero E-R Studio.
  • Experience in big data analysis and developing data models using Hive, PIG, and Map reduce, SQL with strong data architecting skills designing data-centric solutions.
  • Hands on experience with modeling using ERwin in developing Entity-Relationship, modeling Transactional Databases and Data Warehousing, Dimensional Data Modeling for Data Marts and Fact & Dimensional Tables.
  • Extensive Experience working on SQL Queries along with good experience in development of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
  • Experience in Data modeling for Data Mart/Data Warehouse development including conceptual, logical and physical model design, developing Entity Relationship Diagram (ERD), reverse/forward engineer (ERD) with CA ERwin data modeler.
  • Experience in Logical Data Model (LDM) and Physical Data Models (PDM) using Erwin data modeling tool.
  • Hands on experience on tools like R, SQL, SAS and Tableau.
  • Experience in migration of Data from Excel, Flat file, Oracle to MS SQL Server by using SQL Server SSIS.
  • Good knowledge Developing Informatica Mappings, Mapplets, Sessionss, Workflows and Worklets for data loads from various sources such as Oracle, Flat Files, DB2, SQL Server etc.
  • Experience in process improvement, Normalization/De-normalization, data extraction, data cleansing, and data manipulation.
  • Good understanding and working experience of industry standard methodologies like System Development Life Cycle (SDLC), as per Rational Unified Process (RUP), AGILE Methodologies.
  • Experience in ETL design, development and maintenance using Oracle SQL, PL/SQL, TOAD SQL Loader, and Database Management System (RDBMS).
  • Experience in Designed and developed Data models for Database (OLTP), the Operational Data Store (ODS), and Data warehouse (OLAP), and federated databases to support client enterprise Information Management Strategy.
  • Experience in data from various sources like Oracle Database, Flat Files, and CSV files and loaded to target warehouse.
  • Experience in Transform and Load data from heterogeneous data sources to SQL Server using SQL Server Integration Services (SSIS) Packages.
  • Extensive in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams using multiple data modeling tools like Erwin, ER Studio.
  • Experience on writing, testing and implementing of SQL queries using advance Analytical Functions.
  • Knowledge in writing SQL queries, and resolving key performance issues.
  • Good knowledge of Hadoop architecture and its components like HDFS, MapReduce, Job Tracker, Task Tracker, Name Node and Data Node.
  • Integration Architect & Data Scientist experience in Analytics, BigData, BPM, SOA, ETL and Cloud technologies.

TECHNICAL SKILLS:

Data Science Tools: Machine Learning, Deep Learning, Data Warehouse, Data Mining, Data Analysis, Big data, Visualizing, Data Munging, Data Modelling

Database: MySQL, Hive, Microsoft SQL Server 2014/2012/2008/2005, Teradata, MS Access, PostgreSQL, Netezza, SQL Server, Oracle.

Analysis and Modeling Tools: Erwin r9.6/r9.5/r9.1/r8.x, Sybase Power Designer, Oracle Designer, BP win ER/Studio, .1, MS Access 2000, Star-Schema, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.

Reporting Tools: Business Objects, MS Excel Reports, MS Access Reports, Tableau reports, SSRS, SSIS, Crystal Reports

Operating Systems: Windows, Linux, Unix

Languages: SQL, Python (Pandas, SciPy, Sklearn, Matplotlib)

Machine Learning: Regression, Clustering, Random forest, SVM

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9., Ipython, Spyder, Spark

ETL Tools: SSIS, Pentaho, Informatica Power Center 9.7/9.6/9.5/9.1

Big Data: Hadoop, Map Reduce, HDFS 2, Hive, Pig, HBase, Sqoop, Spark.

PROFESSIONAL EXPERIENCE:

Confidential, Chantilly, VA

Data Scientist

Responsibilities:

  • Gathered, documented, and implemented business requirements for analysis or as part of a long-term document/report generation. Analyzed large volumes of data and provide results to technical and managerial staffs.
  • Worked with various data pools and DBAs to have access to data. Have knowledge of NLP, NLTK or Text Mining.
  • Have programming knowledge in Scala, spark, Sql and python.
  • Used K-means clustering for grouping similar data and documented.
  • Extracted, transformed, and loaded data in Postgres data base using Python scripts.
  • Data visualization: Pentaho, Tableau, D3, Django web app. Have knowledge of Numerical optimization, Anomaly Detection and estimation, A/B testing, Statistics, and Maple. Have big data analysis technique using Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.
  • Worked to research and develop statistical learning models for data analysis. Collaborated with product management and engineering departments.
  • SAS Data Analysts is used for analyzing client business needs, managing large data sets, storing and extracting information. Worked for feature engineering that involves converting the arbitrary data to well-behaved data such as dealing with categorical features, text features, image features, and missing data.
  • Hadoop - MAP REDUCE/Hive/Pig/ was used to store, process and analyze huge amount of unstructured data having 100 million users to offer a gift card to its top 10 customers who have spent the most in the previous year. Kafka was used as message broker to collect large volume of data and to analyze the collected data in the distributed system.
  • Splunk ES was used for application management, security, performance management, and analytics for the public APIs.
  • Worked on data generation, machine learning for Anti-Fraud detection, Data Modeling, operation decision, and loss forecasting such as product-specific fraud, or buyer vs. seller fraud.
  • Monte Carlo simulation algorithms were used to obtain numerical results by running simulations many times in succession in order to calculate probabilities with machine learning. Analyzed data for Fraud Analysis and Direct Fraud.
  • K-fold cross Validation technique was used to improved model performance and to test the model on the sample data before finalizing the model.
  • Worked with public/private Cloud Computing technologies (IaaS, PaaS & SaaS) and Amazon Web Services (AWS) and worked for customer analytics and predictions.
  • Kibana and Tableau was used for Business Intelligence tool for visually analyzing the data and to shows the trends, variations and density of the data in form of graphs and charts.
  • Formulated procedures for integration of R programming plans with data sources and delivery systems and R language was used for prediction.
  • Used query languages such as SQL, Hive, Pig and experience with NoSQL databases, such as MongoDB, Cassandra, HBase.
  • Big data Analysis: Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.
  • Worked with both unstructured/structured data Machine Learning Algorithms such as Linear, Logistic, Decision Tress, Random Forests, Support Vector Machines, Neural Networks, KNN, and Time series analysis.
  • Keras along with numerical computation libraries such as Theano and TensorFlow was used for developing and evaluating deep neural network models.
  • Tableau was used for analysing the data to show the trends, variations and density of the data in form of graphs and charts. Tableau was connected to files, relational and big data sources to acquire and process data.
  • Build and maintain SQL scripts, indexes, and complex queries for data analysis and extraction.
  • Created and executed complex SQL statements in both SQL production and development environments.
  • Used scikit-learn, Pandas, and the stats models Python libraries to build predictive forecasting for time series analysis using AR (Autoregressive), MA (Moving Average), and ARIMA (Autoregressive Integrated Moving Average) models.

Confidential, St Louis, MO

Data Analyst

Responsibilities:

  • Implemented various Machine learning algorithms - Linear Regression, Logistic Regression, Decision Tree, SVM, Naive Bayes, KNN, K-Means, Random Forest, and Gradient Boost & Adaboost on UCI Machine Learning Repository.
  • Built data pipeline framework using python for data extraction, data wrangling and data loading in Oracle SQL and Apache HDFS using - Pig and Hive.
  • Involved in Data analysis for data conversion - included data mapping from source to target database schemas, specification and writing data extract scripts/programming of data conversion, in test and production environments.
  • Data Warehouse - Designed and programmed ETL and aggregation of data in target database, working with staging, de-normalized and start schemas and dimensional reporting.
  • Developed business predictive/historic analysis, Data Mining/Text Mining using Python with pandas and R Studio.
  • Integrated new tools and developed technology frameworks/prototypes to accelerate the data integration process and empower the deployment of predictive analytics by developing Spark Scala modules with R.
  • Developed and implemented Predictive analysis using R for Management and Business users for decisions making process.
  • Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
  • Developed Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
  • Performing statistical data analysis and data visualization using Python and R
  • Worked on creating filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Interacting with other data scientists and architected custom solutions for data visualization using tools like tableau, Packages in R and R-Shiny.
  • Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
  • Experience in performing Tableau administering by using tableau admin commands.
  • Involved in running Map Reduce jobs for processing millions of records.
  • Responsible for Data Modeling as per our requirement in HBase and for managing and scheduling Jobs on a Hadoop cluster using Oozie jobs.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
  • Design and development of ETL processes using Informatica ETL tool for dimension and fact file creation.
  • Responsible for business case analysis, requirements gathering use case documentation, prioritization and product/portfolio strategic roadmap planning, high level design and data model.
  • Responsible for the end to end solutions delivery, including sprint planning and execution, change management, project management, operations management, and UAT.
  • Primary liaison between customer and engineering groups; serve as the key unifying force for all BI platform activities that enables better communication across teams, proactively identifies gaps, and ensure the successful delivery of capabilities.

Confidential, Thousand oaks, CA

Data Analyst

Responsibilities:

  • Worked with business requirements analysts/subject matter experts to identify and understand requirements. Conducted user interviews and data analysis review meetings.
  • Defined key facts and dimensions necessary to support the business requirements along with Data Modeler.
  • Created draft data models for understanding and to help Data Modeler.
  • Resolved the data related issues such as: assessing data quality, data consolidation, evaluating existing data sources.
  • Manipulating, cleansing & processing data using Excel, Access and SQL.
  • Responsible for loading, extracting and validation of client data.
  • Coordinated with the front end design team to provide them with the necessary stored procedures and packages and the necessary insight into the data.
  • Participated in requirements definition, analysis and the design of logical and physical data models
  • Leading data discovery discussions with Business in JAD sessions and map the business requirements to logical and physical modeling solutions.
  • Conducted data model reviews with project team members captured technical metadata through data modeling tools
  • Code standard Informatica ETL routines developed standard Cognos Reports.
  • Collaborated with ETL teams to create data landing and staging structures as well as source to target mapping documents
  • Ensure data warehouse database designs efficiently support BI and end user requirements.
  • Collaborated with application and services teams to design databases and interfaces which fully meet business and technical requirements
  • Maintain expertise and proficiency in the various application areas.
  • Maintain current knowledge of industry trends and standards.

Environment: Informatica 9.1, Oracle 11g, SQL Developer, PL/SQL, Cognos, Splunk, TOAD, MS Access, MS Excel

Confidential

Tableau Developer

Responsibilities:

  • Responsible for requirement gathering from business and prepared the requirement specification document and mapping document.
  • Designed and developed Tableau reports to ensure all the business requirements are met and provide enhanced Solution that meets client needs.
  • Performance optimization of dashboards with huge volume of data.
  • Provide mockup dashboards to business.
  • Extracting data from different sources connecting them and used the same to create dashboard.
  • Created data blending in case of merging two different datasets.
  • Reviewed basic SQL queries and edited inner, left, and right joins in Tableau Desktop by connecting live/dynamic and extracts.
  • Preparing Dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies in Tableau.
  • Worked with Groups, hierarchies, Sets to create detail level summary report and Dashboard using KPI's.
  • Worked extensively with Advance analysis Actions, Calculations, Parameters, Background images, Maps, Trend Lines, Statistics, and Log Axes.
  • Generate dashboards to provide trending of data using statistics and KPI’s.
  • Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server, which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
  • Combined views and reports into interactive dashboards in Tableau Desktop that were presented to Business Users, Program Managers, and End Users.
  • Integrated Cognos reports with Tableau for more detailed reporting, using URL actions, and passing filter values from Tableau to Cognos.
  • Extensively used Tab admin and Tabcmd commands in creating backups and restoring backups of Tableau repository.
  • Refresh and publish the data and dashboards to the Tableau server. Scheduling the data source refreshes using windows batch file with Tabcmd scripts.
  • Defined best practices for Tableau report development.
  • Worked on creation of users, groups, projects, workbooks and the appropriate permission sets for Tableau server logons and security checks.
  • Provided 24/7 production support for Tableau users.

Environment: BI Tools: Tableau Desktop V8.2/9.0, Tableau Server V8.2/V9.0.3, Cognos Report Studio 10.1.1, Cognos Query Studio, Cognos Framework Manager, Modelling Tools: Cognos Framework Manager and Database: SQL Server 2008R2,2012.

Confidential

SQL Developer

Responsibilities:

  • Coding based on the technical design and following coding standards.
  • Performance tuning of complex time consuming queries
  • Developing automation tools to reduce manual effort there by reducing the cost of resources involved.
  • Providing fixes to issues by doing a thorough root cause analysis and impact of the defect.
  • Build and Release management.
  • Calculate the Query plan of stored procedure and made the necessary changes to the stored procedure.
  • Providing solutions to the developers for the tuning of stored procedures according to the coding guidelines provided by Organization and Microsoft.
  • Customized the applications which should comply with Organizations own standard
  • Responsible for the design and development of builds, SQL stored procedures, hosting the application including source code control and issue tracking.
  • Developed Tables, Stored procedures, Views, Functions and Triggers to perform automated rules, updating to related tables using SQL Server 2005.
  • Involved in GUI design for the project using Master and Content Pages along with User Controls.
  • Worked on Grid Views to Edit/Display/Modify the cases information.
  • Application will track the code which violates the coding standards.
  • Coding violation results are exported in Excel and PDF through FXCop report Analyzer tool and providing necessary actions suggested by Microsoft.
  • Perform complete testing of Web applications unit and system, engaging development team as necessary.

We'd love your feedback!