We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Houston, TX

PROFESSIONAL SUMMARY:

  • Professional qualified Data Scientist/Data Analyst with around 8+ years of experience in Data Science and Analytics including Data Mining, and Statistical Analysis
  • Hands on experience and comprehensive industry knowledge of Machine Learning, Data Analysis, Statistical Modeling, Data Mining, Text Mining & Natural Language Processing, R, Python and SQL .
  • Data Scientist, who undertakes complex assignments, meets tight deadlines and delivers superior performance. Possesses practical knowledge in Data Analytics and Optimization .
  • Data Scientist with proven expertise in Data Analysis, Machine Learning, and Modeling .
  • Experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K - Means Clustering and Association Rules.
  • Involved in the entire data science project life cycle and actively involved in all the phases including data cleaning, data extraction and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema.
  • Connecting with Business folks to gather requirements and presenting insights and output of the analysis.
  • Experience in developing data models and programs, which will serve required functionality and custom needs.
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server … NoSQL databases like MongoDB 3.2
  • Excellent understanding Agile and Scrum development methodology
  • Used the version control tools like Git 2.X and build tools like Apache Maven/Ant.
  • Experienced the full software lifecycle in SDLC, Agile, DevOps and Scrum methodologies including creating requirements, test plans.
  • Implemented novel algorithm for test and control team using Spark /Scala, Oozie, Confidential and Python on P&G Yarn cluster.
  • Experience in working with R, Python for Statistical Analysis and Numerical Analysis, along with SQL for various environments and business processes .
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008, NoSQL databases like MongoDB 3.2
  • Strong experience in Big Data technologies like Spark 1.6, SparkSQL, PySpark, Hadoop 2.X, Confidential, Hive 1.X.
  • Experience in visualization tools like, Tableau 9.X, 10.X for creating dashboards
  • Expertise and Vast knowledge in Enterprise Data Warehousing including Data Modeling, Data Architecture, Data Integration (ETL/ELT) and Business Intelligence.
  • Skilled in implementing SQL tuning techniques such as Join Indexes(JI), Aggregate Join Indexes (AJI's), Statistics and Table changes including Index.
  • Experienced in using various Teradata Utilities like Teradata Parallel Transporter (TPT), Mload, BTEQ, FastExport, and Fast load.
  • Extensive experience in development and designing of ETL methodology for supporting data transformations and processing in a corporate-wide environment using Teradata, Mainframes, and UNIX Shell Scripting.
  • Experienced in Dimensional Data Modeling experience using Data modeling, Relational Data modeling, ER/Studio, Erwin, and Sybase Power Designer, Star Join Schema/Snowflake modeling, FACT & Dimensions tables, Conceptual, Physical & logical data modeling.
  • Good experience in Production Support, identifying root causes, Troubleshooting and Submitting Change Controls.
  • Experienced in handling all the domain and technical interaction with application users, analyzing client business processes, documenting business requirements.
  • Possess strong analytical and problem-solving skills and have a quick learning curve. Committed team player and capable of working on tight project delivery schedules and deadlines.
  • Experienced in writing Design Documents, System Administration Documents, Test Plans & Test Scenarios/Test Cases and documentation of test results.
  • Extensive experience in development of T-SQL, OLAP, PL/SQL, Stored Procedures, Triggers, Functions, Packages, performance tuning and optimization for business logic implementation.
  • Responsible for architecture design, data modeling, and implementation of Big Data platform and analytic applications.
  • Experienced in Data Scrubbing/Cleansing, Data Quality, Data Mapping, Data Profiling, Data Validation in ETL.
  • Proficient in handling complex processes using SAS/ Base, SAS/ SQL, SAS/ STAT SAS/Graph, Merge, Join and Set statements, SAS/ ODS.

TECHNICAL SKILLS:

Languages: HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, Confidential, Mathematica, FORTRAN, DTD, Schemas, JSON, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras), JavaScript, Shell Scripting

NO SQL Databases: Cassandra, Confidential, MongoDB, MariaDB

Business Intelligence Tools: Tableau server, Tableau Reader, Tableau, Splunk, SAP Business Objects, OBIEE, SAP Business Intelligence, QlikView, Amazon Redshift, or Azure Data Warehouse

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.

Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall

Build Tools: Jenkins, Toad, SQL Loader, Maven, ANT, RTC, RSA, Control-M, Oziee, Hue, SOAP UI

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

PROFESSIONAL EXPERIENCE:

Confidential, Houston, Tx

Data Scientist

Responsibilities:

  • Perform Data Profiling to learn about user behavior and merge data from multiple data sources.
  • Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoop ecosystems such as Confidential, HIVE, and Confidential .
  • Designing and developing various machine learning frameworks using Python, R, and Confidential .
  • Responsible for data identification, collection, exploration, and cleaning for modeling, participate in model development.
  • Performed data cleaning features scaling features engineering
  • Responsible for loading, extracting and validation of client data.
  • Integrate R into Micro Strategy to expose metrics determined by more sophisticated and detailed models than natively available in the tool.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Processed huge datasets (over billion data points, over 1 TB of datasets) for data association pairing and provided insights into meaningful data association and trends
  • Developed cross-validation pipelines for testing the accuracy of predictions
  • Enhanced statistical models (linear mixed models) for predicting the best products for commercialization using Machine Learning Linear regression models, KNN and K-means clustering algorithms
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, visualization and performed Gap analysis. data manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, PowerBL, and SmartView.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Understanding and implementation of text mining concepts, graph processing and semi structured and unstructured data processing.
  • Develop documents and dashboards of predictions in MicroStrategy and present it to the business intelligence team.
  • Good knowledge of Hadoop Architecture and various components such as Confidential, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into Confidential .
  • Collaborate with data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
  • Collect unstructured data from MongoDB 3.3 and completed data aggregation.
  • Perform data integrity checks, data cleaning, exploratory analysis and feature engineer using R 3.4.0.
  • Perform data visualization with Tableau 10 and generate dashboards to present the findings.
  • Determine customer satisfaction and help enhance customer experience using NLP.
  • Good understanding of Hadoop framework, Map/Reduce and other Big Data tools like Hive and PySpark.
  • Worked on SQLServer concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS
  • Good analytical and communication skills and ability to work independently with minimal supervision and also perform as part of a team.
  • Hands on experience with commercial data mining tools such as Splunk, R, Map reduced, Yarn, Confidential, Hive, Floop, Oozie, Scala, Confidential, Master Confidential, Sqoop, Spark, Scala (Machine learning tool) or similar software required depending on seniority level in job field.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2, MLLib, SAS, regression, logistic regression, QlikView.

Confidential, Columbia,MD

Data Scientist.

Responsibilities:

  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into Confidential .
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and Mllib
  • Designing a machine learning pipeline using Microsoft Azure Machine Learning to predict and prescribe and Implemented a machine learning scenario for a given data problem
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Involved in business process Modeling using UML.
  • Participated in the installation of SAS/EBI on LINUX platform
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design the data models. .
  • Provide expertise and recommendations for physical database design, architecture, testing, performance tuning and implementation.
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS, and PL/SQL.
  • Experience in maintaining database architecture and metadata that support the Enterprise Data warehouse.
  • Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database.
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
  • Used big data tools Spark (PySpark, SparkSQL, MLLib) to conduct real-time analysis of loan default based on AWS.
  • Conducted Data blending, Data preparation using Alteryx and SQL for tableau consumption and publishing data sources to Tableau server.
  • Worked very close with Data Architects and DBA team to implement data model changes in the database in all environments.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • Performed performance improvement of the existing Data warehouse applications to increase the efficiency of the existing system.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object-oriented Design) using UML and Visio.

Environment: r9.0, Informatica 9.0, ODS, OLTP, Oracle 12c, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro. Hadoop, PL/SQL, RMSE, PL/SQL, OOD, PySpark, Random forest.

Confidential - NC

Data Analyst.

Responsibilities:

  • Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
  • Developed Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
  • Performing statistical data analysis and data visualization using Python and R
  • Worked on creating filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Created data models in Splunk using pivot tables by analyzing the vast amount of data and extracting key information to suit various business requirements.
  • Created new scripts for Splunk scripted input for collecting CPU, system and OS data.
  • Interacting with other data scientists and architected custom solutions for data visualization using tools like a tableau, Packages in R and R-Shiny.
  • Implemented data refreshes on Tableau Server for biweekly and monthly increments based on a business change to ensure that the views and dashboards were displaying the changed data accurately.
  • Maintenance of large data sets, combining data from various sources by Excel, SAS Grid, Enterprise, Access and SQL queries.
  • Analyzed DataSet with SAS programming, R, and Excel.
  • Publish Interactive dashboards and schedule auto-data refreshes.
  • Experience in performing Tableau administering by using tableau admin commands.
  • Created Hive queries that helped market analysts spot emerging trends by comparing incremental data with Teradata reference tables and historical metrics.
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Developed normalized Logical and Physical database models for designing an OLTP application.
  • Knowledgeable in AWS Environment for loading data files from on prim to Redshift cluster
  • Performed SQL Testing on AWS Redshift databases
  • Developed Teradata SQL scripts using OLAP functions like rank and rank () Over to improve the query performance while pulling the data from large tables.
  • Involved in running Map Reduce jobs for processing millions of records.
  • Written complex SQL queries using joins and OLAP functions like CSUM, Count, and Rank etc.
  • Involved in extensive routine operational reporting, hoc reporting, and data manipulation to produce routine metrics and dashboards for management
  • Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • The building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Responsible for Data Modeling as per our requirement in Confidential and for managing and scheduling Jobs on a Hadoop cluster using Oozie jobs.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SQL Context.
  • Design and development of ETL processes using Informatica ETL tool for dimension and fact file creation.
  • Develop and automate solutions for a new billing and membership Enterprise data Warehouse including ETL routines, tables, maps, materialized views, and stored procedures incorporating Informatica and Oracle PL/SQL toolsets.
  • Performed analysis on implementing Spark using Scala and wrote spark sample programs using PySpark.
  • Created UDF to calculate the pending payment for the given residential or small business customer's quotation data and used in Confidential and Hive Scripts.
  • Experienced in moving data from Hive tables into Confidential for real-time analytics on Hive tables.
  • Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).

Environment: SQL/Server, Oracle 9i, MS-Office, Teradata, Informatica, ER Studio, XML, Hive, Confidential, Flume, Sqoop, R connector, Python, R, Tableau 9.2.

Confidential

Java Developer

Responsibilities:

  • Involved in all the test cases and fixed any bugs or any issues identified during the testing period.
  • Worked on IE Developer tools to debug given HTML.
  • Written test cases for Unit testing using Junit.
  • Implemented logging mechanism using log4j.
  • Created Restful web service in Doc-delete application to delete documents older than given expiration date.
  • Involved in complete development of Agile Development Methodology and tested the application in each iteration.
  • Designed and Developed websites using CXML and REST for Confidential and multiple other clients.
  • Migrated production database from SQL 2000 to SQL 2008 and upgraded production JBOSS application servers.
  • Designed User Interfaces using JavaScript, Ajax, CSS JQUERY, functionality
  • Used Swing for sophisticated GUI components.
  • Writing Java utility classes.
  • Troubleshooting and resolving defects.
  • IntelliJ as IDE for the application development and integration of the frameworks.
  • Designed the application by implementing Struts 2.0 MVC Architecture.
  • Development, enhancement, maintenance and support of Java J2EE applications,
  • Developed JSP and Servlets to dynamically generate HTML and display the data to the client side.
  • Implemented JSON along with Ajax to improve the processing speed.
  • Deployed the applications on Tomcat Application Server.
  • Prepared high and low level design documents for the business modules for future references and updates.

Environment: Teradata, SAS/Access, SAS SQL, MS Excel, Python Pandas, RDBMS, HiveQL.

Confidential

Java Developer

Responsibilities:

  • Involved in coding using Java Servlets, created web pages using JSP's for generating pages dynamically.
  • Involved in developing forms using HTML.
  • Developed Enterprise Java beans for the business flow and business objects.
  • Designing, coding and configuring server side J2EE components like JSP, Servlets, Java Beans, XML.
  • Responsible for implementing the business requirement using the Spring core, Spring boot and Spring data.
  • Extensive use of Struts Framework for Controller components and view components.
  • Learned XML for communicating client and Consumed and created Restful web services.
  • Developed the Database interaction classes using JDBC, java.
  • Rigorously followed Test Driven Development(TDD) in coding.
  • Implemented Action Classes and server side validations for account activity, payment history and Transactions
  • Implemented views using Struts tags, JSTL2.0 and Expression Language.
  • Worked with various java patterns such as Service Locater and Factory Pattern at the business layer for effective object behaviors.
  • Used Hibernate to transfer the application data between client and server.
  • Worked on the JAVA Collections API for handling the data objects between the business layers and the front end
  • Worked with JAXB, SAXP and XML Schema for exporting data into XML format and importing data from XML format to data base and JAXB in the web service's request response data marshalling as well as un marshalling process.
  • Responsible for coding MySQL Statements and Stored procedures for back end communication using JDBC.
  • Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
  • Developed a Restful WebService using spring framework.
  • Involved in implementing the Hibernate API for database connectivity.
  • Maintaining the Source Code Designed, developed and deployed on Apache Tomcat Server.
  • Used Maven for continuous integration of the builds and Used ANT for deploying the web applications.

Environment: Java/J2EE, Core Java, Jdk1.6, Spring Boot, Hibernate, Webservices, JAX-RS, Mockito, WADL, SOAPUI, JSP, JDBC, jQuery, AJAX, Html, CSS, Maven, log4j, Oracle, MS SQL, PL/SQL, SQL Developer, JIRA, JMS, APACHE AXIS, Source Tree, IntelliJ, GIT, UNIX, AGILE-SCRUM.

Confidential

Java Developer

Responsibilities:

  • Developed the Database interaction classes using JDBC, java.
  • Rigorously followed Test Driven Development(TDD) in coding.
  • Implemented Action Classes and server side validations for account activity, payment history and Transactions
  • Implemented views using Struts tags, JSTL2.0 and Expression Language.
  • Worked with various java patterns such as Service Locater and Factory Pattern at the business layer for effective object behaviors.
  • Used Hibernate to transfer the application data between client and server.
  • Worked on the JAVA Collections API for handling the data objects between the business layers and the front end.
  • Involved in complete development of Agile Development Methodology and tested the application in each iteration.
  • Designed and Developed websites using CXML and REST for Confidential and multiple other clients.
  • Migrated production database from SQL 2000 to SQL 2008 and upgraded production JBOSS application servers.
  • Designed User Interfaces using JavaScript, Ajax, CSS JQUERY, functionality
  • Used Swing for sophisticated GUI components.
  • Writing Java utility classes.
  • Troubleshooting and resolving defects.
  • IntelliJ as IDE for the application development and integration of the frameworks.
  • Designed the application by implementing Struts 2.0 MVC Architecture.
  • Development, enhancement, maintenance and support of Java J2EE applications,
  • Developed JSP and Servlets to dynamically generate HTML and display the data to the client side.

Environment: Core Java, SQL (DB2), Design Patterns, Spring, OOPS/OOAD (UML), XML, Hibernate, DOJO 1.5, Eclipse IDE, Tortoise SVN source control, Bugzilla, Autosys, Aqua Studio, JIRA, Cygwin.

We'd love your feedback!