Data Scientist Resume
Edgewood, NY
PROFESSIONAL SUMMARY:
- Having 8+ years of experience in developing Database, ETL & Reporting projects. I worked on Oracle database and Sql Server Databases, Integration Services & Reporting Services (SSIS/SSRS/SSAS,MS SQL) and strong background in SQL Server 2014/2012/2008 R2/2008/2005, stored procedure development and triggers.
- Expertise in SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS).
- Good knowledge in writingSparkapplication using Python, Scala and Java.
- Efficient in analyzingdatausing HiveQL, Pig Latin, partitioning an existingdataset with static and dynamic partition, tunedatafor optimal query performance.
- Good experience transformation and storage: HDFS, MapReduce,Spark. Good understanding of HDFS architecture.Experience in writing MapReduce jobs.
- Experience working with major components in Hadoop Ecosystem like HadoopMapReduce, Apache Crunch, HDFS, Hive, HCatalog, Pig, Falcon, Sqoop, Scala, HBase, Flume, Sqoop,Spark, Storm, Kafka, Oozie and Zookeeper.
- Experienced in Database development, ETL, OLAP, OLTP operations.
- Experience with T - SQL, Table Partitions, Views, Stored Procedures, Functions, Triggers, Common Table Expressions and Indexes.
- Expertise in Data extraction from several databases including Oracle, SQL Server and other external files usingSAS/ ACCESS,SAS/SQL pass through facility, Libname statements, PROC IMPORT procedure and also viaSASData steps.
- Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing variousmachinelearningalgorithms and utilizedmachinelearningalgorithms such as linear regression, multivariate regression, naive bayes, Random Forests, K-means, & KNN for data analysis.
- Extensive knowledge on Azure Data Lake and Azure Storage.
- Experience in migration from heterogeneous sources including Oracle to MS SQL Server.
- Experience in developing Custom Report and different types of Tabular Reports, Matrix Reports, Ad hoc reports and distributed reports in multiple formats using SQL Server Reporting Services (SSRS) in Business Intelligence Development Studio (BIDS).
- Excellent Database administration(DBA) skills including user authorizations, Database creation, Tables, indexes and backup creation.
TECHNICAL SKILLS:
Database Systems: Microsoft SQL Server 2008, MySQL 4.x/5.x, Oracle 10g, 11g, 12c, DB2
Tools: SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), DTS
Scripting: Microsoft Command Shell Scripting, UNIX Shell Scripting
Oracle Tools: Toad, Oracle developer, SQL*PLUS, SQL Navigator.
Languages: C, C++, R, Python, SQL, PLSQL, Pig, Hive, Spark, Scala, Click, Map ReduceMachine Learning, NLP, Deep Learning, Sentiment Analysis, Text Mining
Development Tools: SQL Server Management Studio, Business Intelligence, Development studio, SQL Profiler, Query Analyzer, Hadoop, Tableau, QlikView
Operating System: WINDOWS XP/ Vista/ 7/ 8/10, LINUX AS3.0 & 4.0
Methodologies: Agile - Scrum, RAD, Waterfall, Prototyping, Hybrid
PROFESSIONAL EXPERIENCE:
Confidential, Edgewood, NY
Data Scientist
Responsibilities:
- Worked as a Data Modeler/Analyst to generate Data Models using Erwin and developed relational database system.
- Extensively worked on Data Modeling tools Erwin Data Modeler to design the datamodels.
- I have successfully used pandas for data munging and scikit-learn for the ML and NLP algorithms.I have used Pandas for reading data into a Pandas DataFrame. Moreover, I used Pandas features to process text dataJ into numeric data.
- Involved in transformingdatafrom legacy tables to HDFS, and HBASE tables using Sqoop.
- Research on ReinforcementLearningand control (Tensorflow, Torch), andmachinelearningmodel (Scikit-learn).
- Image classification which classify four different animals from 1280 images by CNN model and with accuracy over 66% in 4 minutes.
- Sentiment Analysis which classifies IMDB reviews into positive and negative via logistic regression with the accuracy over 89%.
- Configured, supported and maintained all network, firewall, storage, load balancers, operating systems, and software inAWSEC2. AWSImport/Export accelerates moving large amounts of data into and out ofAWSusing portable storage devices for transport. Add project users to theAWSaccount with multifactor authentication enabled and least privilege permissions.
- Have employed Bayesian network software GeNIe in my work as a Data Scientist for statistical and probability analysis. I have created a Hybrid model in GeNIe predict and plot probability distributions over a financial data. Playing with the data using BN Software helped my team understand and make future predictions through probability distributions. Graph analytics was included within the GeNIe hybrid model by performing the discretization on the probability distributions.
- Created Azure HDInsight using Azure Data Lake. Used PySpark, Kafka and R to implement this activity.
- Highly skilled in integrating Kafka withSparkstreaming for high speeddataprocessing
- Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs, and Scala.
- Developed Oozie workflow engine to run multiple Hive, Pig, sqoop andSparkjobs.
- Migrated Existing MapReduce programs toSparkModels using Python.Develop predictive analytic using ApacheSparkand Scala APIs.
- Handled importing ofdatafrom variousdatasources, performed transformations using Hive,Sparkand loadeddatainto HDFS.
- Developed Pig, Hive, Sqoop, Hadoop streaming,Sparkactions in Oozie in the workflow management.
Environment: Oracle 12c/11g/10g, SQL Server 2008R2/2005 Enterprise, R, Python, Hadoop, HDFS, Hive, Spark, Scala, Pig, SSRS, SSIS, Machine Learning, NLP, Crystal Reports, Windows Enterprise Server, SQL Profiler.
Confidential - Troy, MI
Data Scientist
Responsibilities:
- Prepared the work space for Markdown. Accomplished Data analysis, statistical analysis, generated reports, listings and graphs.
- Used SAS and Python to identify drug performance via Classification, tree map and regression models along with visualizing data for interactive understanding and decision making.
- CreatedSASdataset from tables in Database usingSAS/Access and Retrieved the Sales data from flat files, oracle database and converted toSASdata sets for Analysis usingSAS/STAT procedures.
- Wrote various queries usingSAS/Base,SAS/SQL andSAS/MACRO for creating reports according to the Users requirement.
- Improving the performance and optimization of existing algorithms in Hadoop usingSparkcontext,Spark-SQL andSparkYARN using Scala.Analyzed the data as per the business requirements using Hive queries.DevelopedSparkscripts by using Scala shell commands as per the requirement.
- UsedSparkAPI over ClouderaHadoop YARN to perform analytics on data in Hive.
- Converted ORACLE data tables intoSASdata files usingSASSQL Pass through Facility, and uploadedSASdata files into ORACLE tables usingSASDbload procedure.
- Found outliers, anomalies, trends in any given data sets.
- Assisted in migrating data, data pump with the Export/Import utility tool.
- Participated in all phases of research including data collection, data cleaning, data mining, developing models and visualizations.
- Redefined many attributes and relationships and cleansed unwanted tables, columns using SQL queries.
- Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
- Worked with data analysis using ggplot2 library in R to do data visualizations for better understanding of customers' behaviors.
- Visually plotted data using Tableau for dashboards and reports.
- Implemented statistical modeling with XGBoost machine learning software package using R to determine the predicted probabilities of each model.
Environment: Informatica 9.0, ODS, OLTP, Oracle 10g, SAS, Spark, Scala, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro. Hadoop, PL/SQL.
Confidential, Arlington, VA
Data Engineer
Responsibilities:
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
- Designed the prototype of the Data mart and documented possible outcome from it for end-user.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
- Created SQL tables with referential integrity and developed queries using SQL, PLSQL and SQL*PLUS.
- Involved with Data Analysis primarily identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
- Created PL/SQL packages, Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Participated in Business meetings to understand the business needs & requirements.
- Provide technical & requirement guidance to the team members for ETL -SSIS design.
Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit- Learn/ Scipy/ Numpy/ Pandas), R, SAS, SPSS, Mysql, Eclipse, PL/SQL, SQL connector, Tableau.
Confidential, Charlottesville, VA
Oracle DBA
Responsibilities:
- Installation, configuration and administration of Oracle 10g/11g databases
- Performed database Cloning to setup test and development database.
- Developed PL/SQL Functions, Procedures Packages and Triggers.
- Importing data from flat files into database using SQL loader and External Tables.
- Implemented Oracle 10g RAC, 11g RAC and Standby Databases including Active Data Guard for High Availability and Disaster Recovery.
- Upgradation and Migration of Database from 9i to 10g and 10g to 11g, applying patches whenever required.
- Developed physical and logical models of the databases using ERWIN
- Migrated databases from Windows to Linux, Sun Solaris to AIX.
- Written shell scripts for the automation of backups and routine tasks.
- Cold Backup, Hot Backups, Logical Backups and RMAN Backups (Incremental Backups)
- Performance monitoring and tuning using Explain Plan, STATSPACK, SQL TRACE, TKPROF, Automatic Workload Repository (AWR) and Automatic Database Diagnostic Monitor (ADDM)
Environment: Sun Solaris, Red Hat Linux, Oracle 11g RAC/10g RAC/9i, HP-UX, Windows, OEM(Grid Control), SQL, PL/SQL, TOAD, RMAN, TKPROF, STATSPACK AWR, ADDM, SQL* Loader, ERWIN.
Confidential
SQL/PLSQL Developer
Responsibilities:
- Responsible for design, development & implementation of the database schema, creating Fact Tables and Dimension Tables using Oracle Designer 6i according to design meeting business requirements.
- Generated server side PL/SQL scripts for data manipulation and validation and created various snapshots and materialized views for remote instances.
- Analyzed the business requirements of the project by studying the Business Requirement Specification document.
- Created SQL/PLSQL Packages, Triggers and prepared user manuals for the new programs.
- Involved in creating and modifying PLSQL and SQL*Loader scripts for data extractions, conversions(ETL) and Automating data loading, extraction, report generation using UNIX shell scripting.
- Fine-tuned SQL, PL/SQL, procedures, and functions for the maximum efficiency in various schemas across databases.
- Worked closely with Business Owners to make sure requirements where converted correctly to Functional Specs and the correct logic was implemented in the technical design and code.
Environment: Oracle 10g/9i, Oracle Designer 6i, SQL, PL/SQL, SQL*Navigator, UNIX shell scripting, PERL, Autosys and Windows XPs/7.
Confidential
SQL/PLSQL developer
Responsibilities:
- Participated in analysis, design, development, testing, and implementation of various financial Systems using Oracle 8i, Developer 2000 and PL/SQL.
- Wrote UNIX Shell Scripts to run database jobs on server side.
- Define database structure, mapping and transformation logic. Creation of External Table scripts for loading the data from source for ETL (Extracting Transforming and Loading) Jobs.
- Used TOAD and SQL navigator extensively.
- Worked with various functional experts to implement their functional knowledge into business rules in turn as working code modules like procedures and functions.
- Developed new and modified existing packages, Database triggers, stored procedure and other code modules using PL/SQL in support of business requirements.
Environment: Oracle 8i, SQL, PL/SQL, ETL, TOAD, UNIX.