Data Scientist Resume
Atlanta, GA
SUMMARY
- Around 8+ years of experience in IT and 5+ years experience in Data scientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions, hand on working experience in machine learning and statistics to draw meaningful insights from data. I am good at communication and storytelling with data, Expertise in all aspects of Software Development Life Cycle (SDLC) from requirement analysis, Design, Development Coding, Testing, Implementation, and Maintenance.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python, and Tableau, Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms, utilize analytical applications/libraries like Plotly, D3 JS, and Tableau to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into marketing strategies that drive value.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data, Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
- Designing of Physical Data Architecture of New system engines, Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Neural Networks, Principle Component Analysis and good knowledge on Recommender Systems, expertise in Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments, regularly accessing JIRA tool and other internal issue trackers for the Project development, experienced in working with enterprise search platform like Apache Solr and distributed real-time processing system like Storm, hands on experience on Spark Mlib utilities such as classification, regression, clustering, collaborative filtering, and dimensionality reductions.
- Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing, experience in Data migration from existing data stores to Hadoop, developed Map Reduce programs to perform Data Transformation and analysis, extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, ScikitLearn, Hadoop Map Reduce.
- Expertise in the implementation of Core concepts of Java, JEE Technologies, JSP, Servlets, JSTL, EJB, JMS, Struts, spring, Hibernate, JDBC, XML, Web Services, and JNDI.
- Extensive experience working in a Test-Driven Development and Agile-Scrum Development.
- Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos 5/6, Ubuntu 13/14, and Cosmos.
- Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features, Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
TECHNICAL SKILLS
Languages: C, C++, Python, T-SQL, PL/SQL, SQL, XML, HTML, DHTML, HTTP, Matlab, DAX.
Databases: SQL Server, MS-Access, Oracle 11g/10g/9i and Teradata, Big data, Hadoop, Cassandra.
DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.
Database Design Tools & Data Modeling: ERWIN 4.5/4.0, MS Visio, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball & Inmon Methodologies.
Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA.
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Data Scientist
Responsibilities:
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients, setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python, installed and used Caffe Deep Learning Framework, Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes, Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, visualization, and performed Gap analysis, Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Power BI and Smart View, Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7, data transformation from various resources, data organization, features extraction from raw and stored.
- Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms, focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems, good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, NameNode, DataNode, Secondary NameNode, and MapReduce concepts.
- Implemented Agile Methodology for building an internal application, as Architect delivered various complex OLAP databases/cubes, scorecards, dashboards, and reports.
- Programmed a utility in Python that used multiple packages (scipy, numpy, pandas), updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions, Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
Environment: R 9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Rational Rose.
Confidential, Melbourne, FL
Data Scientist
Responsibilities:
- Analyzed the business requirements of the project by studying the Business Requirement Specification document.
- Extensively worked on Data Modeling tools Erwin Data Modeler to design the data models.
- Designed a mapping to process the incremental changes that exist in the source table. Whenever source data elements were missing in source tables, these were modified/added inconsistency with third normal form based OLTP source database.
- Designed tables and implemented the naming conventions for Logical and Physical Data Models in Erwin 7.0.
- Participated in the conversion of ITS (Immigration Tracking System) Visual Basic client-server application into C#, ASP.NET 3-tier Intranet application.
- Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
- Perform a proper EDA, Uni-variate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
- Designed data models and data flow diagrams using Erwin and MS Visio.
- Participated in AMS (Alert Management System) JAVA and SYBASE project. Designed SYBASE database utilizing ERWIN. Customized error messages utilizing SP ADDMESSAGE and SP BINDMSG. Created indexes, made query optimizations. Wrote stored procedures, triggers utilizing T-SQL.
- Explained the data model to the other members of the development team. Wrote XML parsing module that populates alerts from the XML file into the database tables utilizing JAVA, JDBC, BEA WEBLOGIC IDE, and Document Object Model.
- As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
- Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for forwarding/Reverse Engineered Databases.
- Explored and Extracted data from source XML in HDFS, preparing data for exploratory analysis using data munging.
Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Hadoop, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyze.
Confidential, Chicago, Illinois.
Data Scientist
Responsibilities:
- Coded R functions to interface with Caffe Deep Learning Framework
- Working in Amazon Web Services cloud computing environment
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, space-time.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop, and MongoDB.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
- Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
- Perform a proper EDA, Uni-variate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
- Designed data models and data flow diagrams using Erwin and MSVisio.
- As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
- Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for forwarding/Reverse Engineered Databases.
- Established Data architecture strategy, best practices, standards, and roadmaps.
- Performed data cleaning and imputation of missing values using R.
- Worked with Hadoop eco system covering HDFS, HBase, YARN, and MapReduce
- Take up ad-hoc requests based on different departments and locations
- Used Hive to store the data and perform data cleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau.
- Creating customized business reports and sharing insights to the management.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
Environment: Erwin r, Informatica, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, and Requisite Pro, Hadoop, PL/SQL, etc.
Confidential, New York
Data Scientist
Responsibilities:
- Statistical Modeling with ML to bring Insights in Data under the guidance of Principal Data Scientist
- Data modeling with Pig, Hive, Impala.
- Ingestion with Sqoop, Flume.
- Used SVN to commit the Changes into the main EMM application trunk.
- Worked with Ajax API calls to communicate with Hadoop through Impala Connection and SQL to render the required data through it.These API calls are similar to Microsoft Cognitive API calls.
- Good grip on Cloudera and HDP ecosystem components.
- Used ElasticSearch (Big Data) to retrieve data into the application as required.
- Performed Map Reduce Programs those are running on the cluster.
- Involved in loading data from RDBMS and weblogs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Have hands-on experience working with Sequence files, AVRO, HAR file formats and compression.
- Used Hive to partition and bucket data.
- Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured data.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
- Worked on improving the performance of existing Pig and Hive Queries.
Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.
Confidential
Data Architect/Data Modeler
Responsibilities:
- Worked with large amounts of structured and unstructured data.
- Knowledge of Machine Learning concepts (Generalized Linear models, Regularization, Random Forest, Time Series models, etc.)
- Worked in Business Intelligence tools and visualization tools such as Business Objects, Tableau, ChartIO, etc.
- Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, and AJAX.
- Configured the project on WebSphere 6.1 application servers
- Implemented the online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL.
- Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC.
- Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
- Used SAX and DOM parsers to parse the raw XML documents
- Used RAD as Development IDE for web applications.
- Preparing and executing Unit test cases.
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements to the front-end modules.
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application.
- Doing functional and technical reviews
- Maintenance in the testing team for System testing/Integration/UAT.
- Guaranteeing quality in the deliverables.
- Implemented the project in Linux environment.
Environment: R, Erwin, Tableau, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.
