We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY:

  • Expertise in complete software development lifecycle process that includes Analysis, Design, Development, Testing, and Implementation inHadoop Eco - System, Documentum 6.5 sp2 suits of products and Java technologies.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Extensive experience inHive, Sqoop, Flume, Hue, and Oozie.
  • Working knowledge of Microsoft .Net 4.5/ 4.0/ 3.5/3.0
  • Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database.
  • Used Amazon Web Services' (AWS) S3, EC2, and EMR as well Python (Pandas, SciPy, PyQt), bash scripting, and Spark to model large-scale time series data
  • Created 3NF business area data modeling with de-normalized physical implementation data and information requirements analysis using ERWIN tool.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for dataanalysis.
  • Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
  • Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for data analysis.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
  • Proficient in Data Science programming using Programing in R, Python and SQL
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure
  • Developed the code as per the client's requirements using SQL, PL/SQL and Data Ware housing concepts.
  • Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirement specifications, functional specifications, Test Plans, Data Validation, Source to Target mappings, SQL Joins, Data Cleansing.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Experience in conducting Joint Application Development (JAD) sessions for requirements gathering, analysis, design and Rapid Application Development (RAD) sessions Experience working on Data quality tools Informatica IDQ (9.1), Informatica MDM (9.1).
  • Collaborated with the lead Data Architect to model the Data warehouse in accordance with FSLDM subject areas, 3NF format, and Snowflake schema.
  • Proficient in SAS/BASE,SAS EG,SAS/SQL, SAS MACRO, SAS/ACCESS
  • Experience inend-to-endimplementation ofdatawarehouse projectbased on the SAS EG.
  • Experience in extractdatafrom database such as DB2, Oracle, and SME-IM, MAD, M240 and UNIXserver using SAS.
  • Knowledge in Business Intelligence tools like Business Objects, Cognos, Tableau, and OBIEE
  • Integration Architect & Data Scientist experience in Analytics, Big Data, BPM, SOA, ETL and Cloud technologies.
  • Built Coe competencies in the area of Analytics, SOA/EAI, ETL, and BPM.
  • Experience in foundational machine learning models and concepts: regression, random forest, boosting,
  • Experience in machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
  • Collaborated with the lead Data Architect to model the Data warehouse in accordance with FSLDM subject areas, 3NF format, Snowflake schema.
  • Working knowledge of DICOM and Problem Loan Management applications.
  • Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
  • Skilled in System Analysis, E-R/DimensionalDataModeling, Database Design and implementing RDBMS specific features.
  • Strong experience in interacting with stakeholders/customers, gathering requirements through interviews, workshops, and existing system documentation or procedures, defining business processes, identifying and analyzing risks using appropriate templates and analysis tools.
  • Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirement specifications, functional specifications, Test Plans, Data Validation, Source to Target mappings, SQL Joins, Data Cleansing.
  • Experience in designing star schema, Snowflake schema forDataWarehouse, ODS architecture.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Having good experience in NLP with Apache, Hadoop, and Python.
  • Experience working withdatamodeling tools like Erwin, Power Designer, and ER Studio.

TECHNICAL SKILLS:

Data Modeling Tools: Erwin r9.6/9.5, ER/Studio 9.7, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.

Frameworks: Microsoft .Net 4.5/ 4.0/ 3.5/3.0 , Entity Framework, Bootstrap, Microsoft Azure, Swagger.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball & Inmon Methodologies

Version Controller: TFS, Microsoft Visual SourceSafe, GIT, NUNIT, MSUNIT

Software Packages: MS-Office 2003/ 07/10/13 , MS Access, Messaging Architectures.

Operating Systems: Windows Win8/XP/NT/ 95/98/2000/2008/2012 , Android SDK.

Microsoft Technologies: PHP,Scala2,Shark2,Awk,Cascading,Cassandra,Clojure,Fortran,JavaScript,JMP,Mahout,objectiveC,QlickView,Redis,Redshifed

Web Technologies: Windows API, Web Services, Web API (RESTFUL) HTML5, XHTML, CSS3, AJAX, XML, XAML, MSMQ, Silverlight, Kendo UI.

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Programming Languages: C#, VB.NET (VB6), VBScript, OOPS, Data structures, Algorithms, Python, R, Java, Java Script, SQL, J2EE, C, C++ and XML.

Development Tools: R x 30, SQL x 27, Python x 22, Hadoop x 19, SAS x 18, Java x15, Hive x 13, Mat lab x 12, R Studio,SAS, MSOffice,Visual Studio 2010

Big Data Tools: Hadoop 2.7.2, Hive, Spark2.1.1, Pig, HBase, Sqoop, Flume.

PROFESSIONAL EXPERIENCE:

Confidential, ATLANTA, GA

Data Scientist

Responsibilities:

  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XGBoost, SVM, and Random Forest.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Created various types of data visualizations using R, python and Tableau.
  • Implementing Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering and dimensionality reduction.
  • Used R and python for Exploratory Data Analysis, A/B testing, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
  • Used common data science toolkits, such as R, Python, NumPy, Keras, Theano, Tensorflow, etc.
  • Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database.
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and snowflake schemas.
  • Manipulating/mining data from database tables (Redshift, Oracle, Data Warehouse)
  • Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, Map Reduce, Pig and others
  • Interface with other technology teams to extract, transform, and load (ETL) data from a wide variety of data sources
  • Own the functional and nonfunctional scaling of software systems in your own area.
  • Provides input and recommendations on technical issues to BI Engineers, Business & Data Analysts, and Data Scientists.

Environment: Teradata 13.1, Informatica 6.2.1, Ab Initio, Business Objects, Oracle 9i, PL/SQL, Microsoft Office Suite (Excel, VLOOKUP, Pivot, Access, PowerPoint), Visio, VBA, Micro Strategy, Tableau, ERWIN.

Confidential, Memphis, TN

Data Scientist

Responsibilities:

  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked asDataArchitectsand ITArchitectsto understand the movement ofdataand its storage and ERStudio9.7
  • Processed huge datasets (over billion data points, over 1 TB of datasets) for data association pairing and provided insights into meaningful data association and trends
  • Developed cross-validation pipelines for testing the accuracy of predictions
  • Enhanced statistical models (linear mixed models) for predicting the best products for commercialization using Machine Learning Linear regression models, KNN and K-means clustering algorithms.
  • Executed ad-hoc data analysis for customer insights using SQL using Amazon AWS Hadoop Cluster.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, PowerBL, and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary NameNode, and MapReduce concepts.
  • AsArchitectdelivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (sciPy, numPy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2, MLlib, SAS, regression, logistic regression

Confidential, Minnesota

Data Engineer

Responsibilities:

  • Analyzed, Designed and Processed large datasets (over 200GB) of 24 - 36 base paired nucleotides
  • Analyzed large sequencing datasets for identification of key candidates that can be prioritized for further experiments.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
  • Designeddatamodels anddataflow diagrams using Erwin and MS Visio.
  • As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
  • Developed, Implemented & Maintained the Conceptual, Logical & PhysicalDataModels using Erwin for Forwarding/reverse-engineered Databases.
  • EstablishedDataarchitecture strategy, best practices, standards, and roadmaps.
  • Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Sequenced, assembled and annotated the transcriptome (51 bases paired reads, millions of data points).
  • Created Perl programs for combining and analyzing transcriptome and small RNA datasets to identify novel and conserved microRNAs
  • Established the core bioinformatics infrastructure for the university department
  • Leveraged working with cloud computation (AWS, UC Farm cluster), thus reducing the establishment costs
  • Provided leadership and support to colleagues and research teams in data analysis and techniques Confidential UC Davis and with collaborative teams
  • Used linear regression models for predicting the data associations between different individuals and their traits.

Environment: Unix, Python 3.5, MLlib, SAS, regression, logistic regression, Hadoop 2.7, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce.

Confidential, Austin, TX

Data Engineer

Responsibilities:

  • Designed ER diagrams (Physical and Logical using Erwin) and mapping the data into database objects and produced Logical /Physical Data Models.
  • Performed Data Analysis, Data Migration and data profiling using complex SQL on various sources systems including Oracle and Teradata13.1.
  • Participated in Normalization /De-normalization, Normal Form and database design methodology. Expertise in using data modeling tools like MS Visio and Erwin Tool for logical and physical design of databases.
  • Participated in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
  • Used extensively Base SAS, SAS/Macro, SAS/SQL, and Excel to develop codes and generated various analytical reports.
  • Implemented full lifecycle in Data Modeler/Data Analyst, Data warehouses and DataMart's with Star Schemas, Snowflake Schemas, and SCD& Dimensional Modeling Erwin
  • Used ERWIN for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information.
  • Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
  • Documented the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes with help from ETL team.
  • Created 3NF business area data modeling with de-normalized physical implementation data and information requirements analysis using ERWIN tool.
  • Performed data mining on data using very complex SQL queries and discovered pattern and used extensive SQL for data profiling/analysis to provide guidance in building the data model.
  • Created high level ETL design document and assisted ETL developers in the detail design and development of ETL maps using Informatica.
  • Integrated various relational and non-relational sources such as DB2, Teradata, Oracle, SFDC, Netezza, SQL Server, COBOL, XML and Flat Files.
  • Imported data and cleansed the data from various sources like Teradata, Oracle, flat files, SQL Server2008 with high volume data.
  • Developed dimensional model for Data Warehouse/OLAP applications by identifying required facts and dimensions.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
  • Developed the code as per the client's requirements using SQL, PL/SQL and Data Ware housing concepts.
  • Participated in several facets of MDM implementations including Data Profiling, metadata acquisition and data migration.
  • Identified and tracked the slowly changing dimensions (SCD), heterogeneous sources and determined the hierarchies in dimensions.
  • Migrated the Current Optum Rx Data Warehouse from the series database environment to a Netezza appliance.

Environment: ERWIN r9.1, SQL server 2008, Business Objects XI, MS Excel 2010, Informatica, Rational Rose, Oracle 10g, SAS, SQL, PL/SQL, SSRS, SSIS, Confidential -SQL, Netezza, Tableau, XML, DDL, TOAD for Data Analysis, Teradata SQL Assistant.

Confidential

Data Engineer

Responsibilities:

  • Work with users to identify the most appropriate source of record and profile the data required for sales and service.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • The document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Enterprise Metadata Library with any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.
  • Coordinate with the business users in providing an appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
  • Remain knowledgeable in all areas of business operations to identify systems needs and requirements.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
  • Performed data quality in Talend Open Studio. Extensively worked in Oracle SQL, PL/SQL, SQL*Loader, Query performance tuning, created DDL scripts, created database objects like Tables, Views Indexes, Synonyms and Sequences
  • Worked on transferring the data files to vendor through SFTP & FTP process
  • Involved in defining and Constructing the customer to customer relationships based on Association to an account & customer
  • Assisted in developed of client-centric Master Data Management (MDM) solution.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2, MLlib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce

Confidential

Python developer

Responsibilities:

  • Analysis and Design of application.
  • Created UI using Javascript and HTML5/CSS.
  • Developed and tested many features for dashboard using Python, Bootstrap, CSS, and JavaScript.
  • Implemented business logic using Python/Django.
  • Created backend database Confidential -SQL stored procedures and Jasper Reports.
  • Worked with millions of database records on a daily basis, finding common errors and bad data patterns and fixing them.
  • Exported/Imported data between different data sources using SQL Server Management Studio.
  • Maintained program libraries, users' manuals and technical documentation.
  • Managed large datasets using Panda data frames and MySQL.
  • Wrote and executed various MYSQL database queries from python using Python-MySQL connector and MySQL dB package.
  • Carried out various mathematical operations for calculation purpose using python libraries.
  • Built various graphs for business decision-making using Python matplotlib library.
  • Fetched twitter feeds for certain important keyword using python-twitter library.
  • Used Python library Beautiful Soup for webscrapping.ss.
  • Performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team.
  • Implement code in python to retrieve and manipulate data.
  • Created most important Business Rules which are useful for the scope of project and needs of customers
  • BuildSQL queries for performing various CRUD operations like create, update, read and delete.

Environment: Python 2.7, Django, HTML5/CSS, MS SQL Server 2013, Confidential -SQL, Jasper Reports, Javascript, Eclipse, Linux, Shell Scripting.

We'd love your feedback!