Data Scientist Resume
Atlanta, GA
SUMMARY
- Around 8+ years of experience in IT and with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions.
- Extensive experience in TextAnalytics, developing different StatisticalMachineLearning, DataMining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Proficient in StatisticalModeling and MachineLearning techniques (Linear, Logistics, DecisionTrees, RandomForest,SVM, K-NearestNeighbors, Bayesian, XGBoost) in Forecasting/ PredictiveAnalytics, Segmentationmethodologies, Regression-basedmodels, Hypothesistesting, Factoranalysis/ PCA, Ensembles.
- Expertise in transforming business requirements into analyticalmodels,designingalgorithms, buildingmodels, developing data mining and reporting solutions that scale acrossamassive volume of structured and unstructured data.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Designing of Physical DataArchitecture of New system engines.
- Worked and extracted data from various database sources like Oracle, SQLServer, DB2, and Teradata.
- Well experienced in Normalization & De-Normalizationtechniques for optimum performance in relational and dimensional database environments.
- Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Skilled in SystemAnalysis,E-R/DimensionalDataModeling,DatabaseDesign and implementingRDBMS specific features.
- Hands on experience in implementing LDA, Naïve Bayes and skilled in RandomForests, DecisionTrees, LinearandLogisticRegression,SVM,Clustering, neuralnetworks, PrincipleComponentAnalysis(PCA) and good knowledge on Recommender Systems.
- Expertise in all aspects of SoftwareDevelopmentLifeCycle (SDLC) from requirement analysis, Design, DevelopmentCoding, Testing, Implementation and Maintenance.
- Hand on working experience in machine learning and statistics to draw meaningful insights from data. I am good at communication and storytelling with data.
- Extensively work with text data - Natural Language Processing (NLP) for generating autocomplete text for reviews; sentiment analysis on customer reviews
- Utilize analytical applications/libraries like Plotly, D3JS andTableau to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into marketing strategies that drive value.
- Experienced in working with enterprise search platform like ApacheSolr and distributed real-time processing system like Storm.
- Hands on experience on SparkMlib utilities such as classification, regression, clustering, collaborativefiltering, dimensionalityreductions
- Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiment), machine learning, algorithms, data structures and data infrastructure.
- Experience in Extracting data for creating Value Added Datasets using Python, R,SAS, Azure and SQL to analyze the behavior to target a specific set of customers to obtain hidden insights within the data to effectively implement the project Objectives.
- Experience in creating Data Visualizations for KPI’s as per the business requirements for various departments.
- Solid team player, team builder, and an excellent communicator.
- Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, ScikitLearn, HadoopMapReduce
- Expertise in Technical proficiency in Designing, DataModelingOnlineApplications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts
- Extensive experience working in a Test-DrivenDevelopment and Agile-ScrumDevelopment.
- Experience in working on both windows, Linux and UNIX platforms including programming and debugging skills in UNIX Shell Scripting.
- Flexible with Unix/Linux and WindowsEnvironments, working with OperatingSystems like Centos5/6, Ubuntu13/14, Cosmos.
- Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing.
- Experience in Datamigration from existing data stores to Hadoop.
- Developed MapReduce programs to perform DataTransformation and analysis.
TECHNICAL SKILLS
BigData/Hadoop Technologies: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozie
Languages: HTML5,DHTML, WSDL, CSS3, C, C++, XML,R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras), Java Script, Shell Scripting
NO SQL Databases: Cassandra, HBase, MongoDB
Business Intelligence Tools: Tableau server, Tableau Reader, Tableau, Splunk, SAP Business Objects, OBIEE, SAP Business Intelligence, QlikView, Amazon Redshift, or Azure Data Warehouse
Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.
Development Methodologies: Agile/Scrum,Waterfall
Build Tools: Jenkins, Toad, SQL Loader, Maven, ANT, RTC, RSA, Control-M, Oziee, Hue, SOAP UI
Databases: Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata, Netezza
Operating Systems: All versions of Windows, LINUX, Macintosh HD, Sun Solaris
PROFESSIONAL EXPERIENCE
Confidential - Atlanta, GA
Data Scientist
Responsibilities:
- Utilized Spark, Scala, Hadoop, HQL, VQL, oozie, pySpark, Data Lake, TensorFlow, HBase, Cassandra, Redshift, MongoDB, Kafka, Kinesis, Spark Streaming, Edward, CUDA, MLLib, AWS, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
- Worked on analyzing data from Google Analytics, AdWords, Facebook etc.
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like ElasticSearch, Kibana.
- Performed Multinomial Logistic Regression, Decision Tree, Random forest, SVM to classify package is going to deliver on time for the new route.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database and used ETL for data transformation.
- Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
- Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon.
- Developed Spark/Scala, R Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
- Used clustering technique K-Means to identify outliers and to classify un-labeled data.
- Tracking operations using sensors until certain criteria is met using AirFlow technology.
- Responsible for different Data mapping activities from Source systems to Teradata using utilities like TPump, FEXP,BTEQ, MLOAD, FLOAD etc
- Addressed over fitting by implementing of the algorithm regularization methods like L1 and L2.
- Used Principal Component Analysis in feature engineering to analyze high dimensional data.
- Used MLlib, Spark's Machine learning library to build and evaluate different models.
- Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behaviour.
- Developed MapReduce pipeline for feature extraction using Hive and Pig.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
- Created various types of data visualizations using Python and Tableau.
Environment: Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, Impala, AWS, Linux, Spark, Tableau Desktop, SQL Server 2014, Microsoft Excel, Matlab, Spark SQL, Pyspark.
Confidential - Atlanta, GA
Data Scientist/Machine Learning
Responsibilities:
- Analyzed the business requirements of the project by studying the BusinessRequirementSpecification document.
- Designedamapping to process the incremental changes that exist in the source table. Whenever source data elements were missing in source tables, these were modified/added inconsistency with third normal form based OLTP source database.
- Developing propensity models for Retail liability products to drive proactive campaigns.
- Extraction and tabulation ofdatafrom multipledatasources using R, SAS.
- Datacleansing, transformation and creating new variables using R.
- Designed tables and implemented the naming conventions for Logical and PhysicalDataModels in Erwin7.0.
- Performed Exploratory DataAnalysis and DataVisualizations using R, and Tableau.
- Perform a proper EDA, Uni-variate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Worked with DataGovernance, Dataquality, datalineage, Dataarchitect to design various models and processes.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
- Designed data models and data flow diagrams using Erwin and MSVisio.
- Utilized ADO.Net Object Model to implement middle-tier components that interacted with MSSQL Server 2000database.
- Participated in AMS (AlertManagementSystem) JAVA and SYBASE project. Designed SYBASE database utilizing ERWIN. Customized error messages utilizing SP ADDMESSAGE and SP BINDMSG. Created indexes, made query optimizations. Wrote stored procedures, triggers utilizing T-SQL.
- Explained the data model to the other members ofthedevelopment team. Wrote XML parsing module that populates alerts from theXML file into the database tables utilizing JAVA, JDBC, BEAWEBLOGICIDE, Document ObjectModel.
- As an Architect implemented MDMhub to provide clean, consistent data for an SOA implementation.
- Developed, Implemented & Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for forwarding/ReverseEngineeredDatabases.
- Explored and Extracted data from source XML in HDFS, preparing data for exploratory analysis using datamunging.
Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Hadoop, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer
Confidential - Richmond, VA
Data Scientist/NLP Engineer
Responsibilities:
- Coded R functions to interface with Caffe Deep Learning Framework
- Working in AmazonWebServices cloud computing environment
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, space-time.
- Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R, Mahout, HadoopandMongoDB.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used intheanalysis.
- Performed ExploratoryDataAnalysis and DataVisualizations using R, andTableau.
- Perform a proper EDA, Uni-variate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Worked with DataGovernance, Dataquality, datalineage, Dataarchitect to design various models and processes.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
- Designed data models and data flow diagrams using Erwin and MS Visio.
- As an Architect implemented MDM hub to provide clean, consistent data for anSOA implementation.
- Developed, Implemented &Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for forwarding/ReverseEngineeredDatabases.
- Established Dataarchitecture strategy, bestpractices, standards, and roadmaps.
- Lead the development and presentation of a dataanalytics data-hub prototype with the help of the other members of the emerging solutions team
- Performed datacleaning and imputation of missing values using R.
- Worked with Hadoopeco system covering HDFS, HBase, YARNandMapReduce
- Take up ad-hoc requests based on different departments and locations
- Used Hive to store the data and perform datacleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau.
- Creating customized business reports and sharing insights to the management.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.
Environment: Erwin r, Informatica, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, and Requisite Pro, Hadoop, PL/SQL, PHP etc.
Confidential - Charlotte, NC
Data Scientist
Responsibilities:
- Participated in JAD sessions, gathered information from BusinessAnalysts, end users and other stakeholders to determine the requirements.
- Worked in DatawarehousingmethodologiesmensionalDatamodeling techniques such as Star/Snowflakeschema using ERWIN9.1.
- Hands on Experience in CloudComputing such as: Azurestorage, Compute, Databases- SQL, DocumentDB (Cosmos) Datalakestore&analytics, Datafactory, HDInsight, Streamanalytics.
- Extensively used AginityNetezza workbench to perform various DDL, DML etc. operations on Netezza database.
- Designed the DataWarehouse and MDMhubConceptual, Logical and Physicaldatamodels.
- Performed DailyMonitoring of Oracle instances using OracleEnterpriseManager, ADDM, TOAD, monitorusers, tablespaces, memorystructures, rollbacksegments, logs and alerts.
- Used ERStudioData/ Modeler for datamodeling (datarequirementsanalysis, databasedesign etc.) of custom developed information systems, including databases of transactional systems and Datamart's.
- Involved in TeradataSQLDevelopment, UnitTesting and PerformanceTuning and to ensure testing issues are resolved on the basis of using defect reports.
- Customized reports using SAS/MACRO facility, PROCREPORT, PROCTABULATE and PROC.
- Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP and OLAP systems.
- Generated DDL scripts using ForwardEngineering technique to create objects and deploy them into the databases.
- Worked on database testing, wrote complex SQLqueries to verify the transactions and business logic like identifying the duplicate rows by using SQLDeveloper and PL/SQL Developer.
- Used TeradataSQLAssistant, TeradataAdministrator, PMON and data load/export utilities like BTEQ, FastLoad, MultiLoad, FastExport, TPump on UNIX/Windows environments and running the batch process for Teradata.
- Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Worked on Datawarehouse concepts like DatawarehouseArchitecture, Starschema, Snowflakeschema, and DataMarts, Dimension and Facttables.
- Developed SQLQueries to fetch complex data from different tables in remote databases using joins, database links and Bulkcollects.
- Migrated database from legacy systems, SQL server to Oracle and Netezza.
- Used SSIS to create ETL packages to validate, extract, transform and load data to pull data from Source servers to staging database and then to NetezzaDatabase and DB2Databases.
- Worked on SQLServerconceptsSSIS (SQLServerIntegrationServices), SSAS (AnalysisServices) and SSRS (ReportingServices).
Environment: ER Studio, Teradata13.1, SQL, PL/SQL, BTEQ, DB2, Oracle, MDM, Netezza, ETL, RTF UNIX, SQL, Server2010,Informatica,SSRS,SSIS,SSAS,SAS,Aginity
Confidential
Python Developer
Responsibilities:
- Exposed to various phases of Software Development Life Cycle using Agile - Scrum Software development methodology.
- Developed views and templates with Python and Django's view controller and templating language to create a user-friendly website interface.
- Developed the customer complaints application using Django Framework, which includes Python code.
- Implemented web applications using Django framework following MVC architecture.
- Worked on JavaScript MVC Framework like Angular.js and Python Open stack API's.
- Created entire application using Python, Django, MySQL and Linux.
- Created Data tables utilizing PyQt to display patient and policy information and add, delete, update patient records.
- Involved in web designing using HTML 5, XHTML, CSS 3, JQuery, JavaScript extensively used Table less Design in CSS for positioning.
- Developed the required XML Schema documents and implemented the framework for parsing XML documents.
- Used JQuery and Ajax calls for transmitting JSON data objects between frontend and controllers.
- Created a Git repository and added the project to GitHub.
- Designed and developed the UI of the website using HTML, XHTML, AJAX, CSS and JavaScript.
- Worked in MySQL and NoSQL database on simple queries and writing Stored Procedures for normalization and renormalization.
- Responsible for debugging and troubleshooting of web application.
Environment: Python, Django, PyQt, Angular.js, XML Schema, Java Script, AJAX, JQuery, JSON, MySQL, Git, Apache, Linux and Windows.
Confidential
Junior Python Developer
Responsibilities:
- Involved in the requirement gathering, design, development, test, deploy and maintenance of the website
- Profiled python code for optimization and memory management
- Used Django framework and configuration for application development
- UsedHTML5/CSS3, XMLandJavaScriptfor UI development
- Used Angular.js as a mechanism to manage and organize the HTML page layout
- Performed data extraction and manipulation over large relational datasets using SQL, Python, and other analytical tools
- Used Python libraries and SQL queries/subqueries to create several datasets which produced statistics, tables, figures, charts and graphs
- Trained users and provided production support
Environment: Python, Django, TDD, HTML5, CSS3, JavaScript, Angular.JS, AJAX, JQuery, JSON, SQL, Agile and Windows