Data Scientist Resume
Washington, DC
SUMMARY
- Around 8+ years of experience in IT and 5+years’ experience in Data scientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions
- Hands on experience on Spark-Mlib utilities such as classification, regression, clustering, collaborative filtering, dimensionality reductions
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau
- Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiment), machine leaning, algorithms, data structures and data infrastructure
- Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors) in Forecasting/Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensemble
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions dat scales across massive volumes of structures and unstructured data.
- Solid team player, team builder, and an excellent communicator.
- Extensive hands on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, ScikitLearn, Hadoop Map Reduce
- Expertise in Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Expertise in teh implementation of Core concepts of Java, JEE Technologies, JSP, Servlets, JSTL, EJB, JMS, Struts, spring, Hibernate, JDBC, XML, Web Services, and JNDI.
- Extensive experience working in a Test-Driven Development and Agile-Scrum Development.
- Experience in working on both Windows, Linux and UNIX platforms including programming and debugging skills in UNIX Shell Scripting.
- Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos.
- Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing.
- Experience in Data migration from existing data stores to Hadoop.
- Developed Map Reduce programs to perform Data Transformation and analysis.
- Experience in analyzing data with Hive and Pig using on reading data schema.
- Created Development Environments in Amazon Web Services using services like VPC, ELB, EC2, ECS and RDS instances
- Strong experience in Software Development Life Cycle(SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies
- Proficient in Data Science programming using Programing in R, Python and SQL
- Proficient in SQL, Database, Data Modeling, Data Warehousing, ETL and reporting tools
- Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB, and its integration with Hadoop cluster.
- Proficient in using AJAX for implementing dynamic Web Pages.
TECHNICAL SKILLS
Databases: Oracle, MySQL, MSSQL Server, Sybase, PostgreSQL, Mongo DB, NoSQL.
Hadoop/BigData: HDFS, Map reduce, YARN, Hive, Pig, Sqoop, Oozie, Zoo keeper, Flume, HBase
Programming Languages: Python, SQL, T-SQL, Matlab, C, C++, HTML, PL/SQL, XML, DHTML, HTTP, Java, Hadoop.
Database Design Tools and Data Modeling: Physical& logical data modeling, Dimensions tables, Kimball.
IDE: Eclipse, IntelliJ, Net Beans, IBM Rational Application Developer (RAD)
Web Servers: JBoss, Web Logic, Web Sphere, Tomcat, Jetty, Apache
Reporting Tools: Shiny, Power BI, Tableau, Jasper Reports, BIRT, Crystal Reports.
Statistical Software: SPSS, R, SAS
Tools: and Utilities: Crystal Reports, Power Pivot, DTS, SQL Server Enterprise Manager, SQL Server Profiler, Microsoft Management Console, Visual Source Safe 6.0, Excel Power Pivot, SQL Server, ProClarity, Microsoft Office 2007/10/13, Visual Studio v14, Excel Data Explorer, Tableau 8/10 Import & Export Wizard.Net.
Operating Systems: UNIX and Linux, Microsoft Windows 8/7/XP.
PROFESSIONAL EXPERIENCE
Confidential, Washington DC
Data Scientist
Responsibilities:
- Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modelling.
- Responsible for performing Machine-learning techniques regression/classification to predict teh outcomes.
- Designed teh prototype of teh Datamart and documented possible outcome from it for end-user.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, BusinessObjects.
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Involved in business process modelling using UML
- Interaction with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and functionality for various project solutions.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Worked closely with business, datagovernance, SMEs and vendors to define data requirements.
- Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
- Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database.
- Developed large data sets from structured and unstructured data. Perform datamining.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to built sustainable BigDataplatforms for teh clients.
- Partnered with modellers to develop data frame requirements for projects.
- Tracked various campaigns, generating customer profiling analysis and data manipulation.
- Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
- Having good exposure in BigDatatechnologies, Hadoop ecosystems HDFS, Map Reduce, Pig, SQOOP and Hive for scalability, distributed computing and High performance computing.
Environment: R/ Python, Informatica 9.0, ODS, OLTP, Oracle 11g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro., Hadoop, SQL, PL/SQL, Sql*Load, UML.
Confidential, Denver, Co
Data Scientist
Responsibilities:
- Worked as a DataModeller /Analyst to generate Data Models using Erwin and developed relational database system.
- Responsible for performing Machine-learningtechniques regression/classification to predict teh outcomes.
- Performance tuning of teh database, which includes indexes, and optimizing SQL statements, monitoring teh server
- Involved with Data Analysis primarily IdentifyingDataSets, SourceData, Source Meta Data, Data Definitions and Data Formats
- Collaborate teh data mapping document from source to target and teh data quality assessments for teh source data.
- Wrote simple and advanced SQLqueries and scripts to create standard and adhoc reports for senior managers.
- Performed performance improvement of teh existing Data warehouse applications to increase efficiency of teh existing system.
- Created PL/SQL packages and DatabaseTriggers and developed user procedures and prepared user manuals for teh new programs.
- Provide expertise and recommendations for physicaldatabasedesign, architecture, testing, performance tuning and implementation
- Designed and developed Use Case, ActivityDiagrams, SequenceDiagrams, OOD (ObjectorientedDesign) using UML and Visio.
- Analyzed large datasets to answer business questions by generating reports and outcome.
- Provided R/SQL programming, with detailed direction, in teh execution of data analysis dat contributed to teh final project deliverables. Responsible for data mining.
- Executed SQL queries from R/Python on complex table configurations
- Worked in a team of programmers and data analysts to develop insightful deliverables dat support data-driven marketing strategies.
Environment: R/ Python, Informatica 9.0, ODS, OLTP, SQL Profiler, and Query Analyzer, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Oracle 11g/10g.
Confidential, Plano, TX
Data Analyst/Data Modeler
Responsibilities:
- Worked as Data Modeler/Analyst to generate data models using Ervin and developed relational database system.
- Analyzed teh business requirements of teh projects by studying teh Business Requirement Specification Document
- Extensively worked on Data Modeling tools Ervin Data Modeler to design teh Data Models.
- Designed Mapping to process teh incremental changes dat exists in teh source table. Whenever source date elements were missing in source tables, these were modified/added in consistency with 3NF based OLTP source database.
- Designed tables and implemented teh naming conventions for Logical and Physical Data Models in Ervin 7.0.
- Provided Expertise and recommendations for physical database design, architecture, testing performance tuning and implementation.
- Design Logical and Physical Data Models for multiple OLTP and analytic applications.
- Extensively used teh Ervin design tool & Ervin model manager to create and maintain teh DataMart.
- Designed teh physical model for implementing teh model into oracle 9i physical database.
- Involved with data Analysis. Primarily identifying Datasets, Data Source, Source Meta Data, Data Definitions and Data formats.
- Performance tuning of teh database, which includes indexing and optimizing SQL statements, monitoring teh server.
- Wrote simple and advanced SQL queries and scripts to create standard and ad-hoc reports for senior managers.
- Collaborated teh Data Mapping document from source to target and teh data quality assessments for teh Data source.
- Used expert level understanding of different databases in combinations for Data Extractions and loading, integrated data extracted from different databases and loading to a specific database.
- Designed and developed Insurance, Claims and Financial Reports Using SSRS.
Environment: SQL Server 2008 R2/2005 Enterprise, SSRS, SSIS, crystal Reports, Windows Enterprise server 2000,DTS, SQL Profiler, and query Analyzer, MS Visio, Ervin
Confidential, Buffalo, NY
Hadoop developer
Responsibilities:
- Responsible to manage data coming from different data sources.
- Designed and developed UDF’S to extend teh functionality in both PIG and HIVE.
- Involved in collecting business requirements from teh Business partners and subject Matter Experts.
- Developed MapReduceprograms to perform data filtering for unstructured data.
- Wrote Hive queries for data analysis to meet teh business requirements.
- Created Partitioned Hivetables and worked on them using Hive.
- Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
- Used Flume to channel data from different sources to HDFS.
- Created HBase tables to store data depending on column families.
- Worked with administrator to set up and monitor teh Hadoopcluster
- Supported Map Reduce Programs which are running on teh cluster.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Actively updated teh upper management with daily updates on teh progress of project dat include teh classification levels dat were achieved on teh data.
- Developed scripts and batch jobs to schedule various Hadoopprograms.
- Designed and Maintained Oozieworkflows to manage teh flow of jobs in teh cluster.
Environment: Java, Hadoop, Map Reduce, HDFS, Pig, Hive, HBase Linux, MySQL, Ubuntu.
Confidential
Java developer
Responsibilities:
- Designed and developed teh front end using JSP, HTML and JavaScript and JQuery.
- Designed teh application by implementing Struts Framework based on MVCArchitecture.
- Developed custom Tagsin Struts.
- Developed framework for data processing using Design patterns, Java, XML.
- Used Spring IOC for dependency injection to Hibernate and Spring Frameworks.
- Used teh light weight container of teh Spring Framework to provide architectural flexibility for Inversion of Controller (IOC).
- Developed EJB components dat are deployed on Web logic Application Server.
- Designed and developed Session beans to implement teh Business logic.
- Designed and developed various configuration files for Hibernate mappings.
- Written unit tests using JunitFramework and Logging is done using Log4J Framework.
- Designed and Developed SQLqueries and StoredProcedures.
- Applied CSS (Cascading style Sheets) for entire site for standardization of teh site.
- Offshore co-ordination and User acceptance testing support.
- Actively involved in code reviews and bugfixing.
Environment: Java 5.0, Struts, Spring 2.0, Hibernate 3.2, Web Logic 7.0, Eclipse 3.3, Oracle 10g, Junit 4.2,Maven, Windows XP, HTML, CSS, JavaScript, and XML.
Confidential
Java developer
Responsibilities:
- Involved in teh analysis & design of teh application using Rational Rose.
- ObjectOrientedAnalysis and Design using UML include development of class diagrams, Sequence diagrams and State diagrams and implemented these diagrams in Microsoft Visio.
- Developed teh various action classes to handle teh requests and responses.
- Designed and created JavaObjects, JSPpages, JSF, JavaBeans and Servlets to achieve various business functionalities. Created validation methods using JavaScript and Backing Beans.
- Involved in writing client side validations using JavaScript, CSS.
- Involved in teh design of teh Referential Data Service module to interface with various databases using JDBC.
- Used Hibernate framework to persist teh employee work hours to teh database.
- Spring framework features were extensively used.
- Developed and configured using BEAWebLogicApplicationServer.
- Developed teh build scripts using Ant.
- Involved in designing test plans, test cases and overall Unit testing of teh system.
- Developed controllers and actions encapsulating teh businesslogic.
- Developed classes and interface with underlying webservices layer.
- Designed web services for teh above modules.
- Prepared documentation and participated in preparing user's manual for teh application.
Environment: Java, Spring 2.0, Hibernate 3.2, Web Logic, Eclipse, SQL Server 2008, Junit 4.2,Ant, Windows XP, HTML, CSS, JavaScript, and XML.
