Data Scientist Resume
4.00/5 (Submit Your Rating)
PROFESSIONAL EXPERIENCE:
Data Scientist
Confidential
Responsibilities:
- Extracted data from HDFS and prepared data for exploratory analysis using data munging
- Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XGBoost, SVM, and Random Forest.
- Participated in all phases of data mining, data cleaning, data collection, developing models, validation, and visualization and performed Gap analysis.
- A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, MongoDB, Hadoop.
- Setup storage and data analysis tools in AWS cloud computing infrastructure.
- Installed and used Caffe Deep Learning Framework
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
- Used pandas, numpy, seaborn, matplotlib, scikit - learn, scipy, NLTK in Python for developing various machine learning algorithms.
- Implemented Agile Methodology for building an internal application.
- Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
- Programmed a utility in Python that used multiple packages (numpy, scipy, pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, Naive Bayes, KNN.
- As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ ETL from OLTP Source Systems to OLAP Target Systems
- Validated the machine learning classifiers using ROC Curves and Lift Charts.
Data Scientist
Confidential
Responsibilities:
- Utilized Spark, Scala, Hadoop, HQL, VQL, oozie, pySpark, Data Lake, TensorFlow, HBase, Cassandra, Redshift, MongoDB, Kafka, Kinesis, Spark Streaming, Edward, CUDA, MLLib, AWS, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
- Worked on analyzing data from Google Analytics, AdWords, Facebook etc.
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like ElasticSearch, Kibana.
- Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, Date and Time etc.
- Categorized comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics
- Performed Multinomial Logistic Regression, Decision Tree, Random forest, SVM to classify package is going to deliver on time for the new route.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve datafrom Oracle database and used ETL for data transformation.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Ensured that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
- Used Principal Component Analysis in feature engineering to analyze high dimensional data.
- Used MLlib, Spark's Machine learning library to build and evaluate different models.
Data Analyst
Confidential
Responsibilities:
- Implemented EP Data Lake provides a platform to manage data in a central location so that anyone in the firm can rapidly query, analyze or refine the data in a standard way.
- Involved in moving legacy data from Sybase data warehouse to Hadoop Data Lake and migrating the data processing to lake.
- Responsible for creating Data store, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
- Created Java based Spark refiners to replace existing SQL Stored Procedures.
- Created Hive refiners for simple UNIONS and JOINS.
- Have experience in executing Hive Queries using Spark SQL that integrates Spark environment.
- Implemented near real time data pipeline using framework based on Kafka, Spark and MemSQL.
- Used REST services in Java and Spring to expose data in the lake.
- Automated the triggering of Data Lake REST API calls using Unix Shell Scripting and PERL.
- Created reconciliation jobs for validating data between source and lake.
- Used Scala to test Dataframe transformations and debugging issues with data.
- Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate with other IDE features in Eclipse.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Used Avro format for staging data and ORC for final repository.
- Worked on the data modeling service which is our own tool (i.e. PURE MODEL). I have used the data from data lake virtual warehouse and I have exposed the output of data model to java web services and which has been accessed by the end users.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
- Used Sqoop import and export functionalities to handle large data set transfer between Sybase database and HDFS.
Hadoop Developer
Confidential
Responsibilities:
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Analyzed data using Hadoop Components Hive and Pig.
- Worked on Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Involved in loading data from UNIX file system to HDFS.
- Involved in development using Cloudera distribution system.
- Worked Hands on with ETL process.
- Developed Hadoop Streaming jobs to ingest large amount of data.
- Load and transform large data sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Imported data using Sqoop from Teradata using Teradata connector.
- Created Sub-Queries for filtering and faster execution of data.
- Created multiple Join tables and fetched the required data.
- Worked with Hadoop clusters using Cloudera (CDH5) distributions.
- Perform Importing and Exporting the Data using SQOOP from HDFS to Relational Database systems.
- Install and Set up HBASE and Impala.
- Used python libraries like Beautiful Soap, NumPy and SQLAlchemy.
- Used Apache Impala to read, write and query the Hadoop data in HDFS, Hbase and Cassandra.
- Implemented Partitioning, Dynamic Partitions and Buckets in Hive.
- Supported Map Reduce Programs those are running on the cluster.
- Worked on debugging, performance tuning of Hive&Pig Jobs.
- Bulk load the data into Oracle using JDBC template.
- Worked on Python OpenStack APIs and used NumPy for Numerical analysis.
Software Developer
Confidential
Responsibilities:
- Developed Entity Java Beans (EJB) classes to implement various business functionalities (session beans).
- Developed various end user's screens using JSF, Servlet technologies and UI technologies like HTML, CSS and JavaScript.
- Performed necessary validations of each screen developed by using AngularJS and JQuery.
- Configured Spring configuration file to make use of Despatcher Servlet provided by Spring IOC.
- Separated secondary functionality from primary functionality using Spring AOP.
- Used Spring Repository to load data from MongoDB database to implement DAO layer.
- Developed a Stored Procedures for regular cleaning of database.
- Prepared test cases and provided support to QA team in UAT.
- Consumed Web Service for transferring data between different applications using RESTful APIs along with Jersey API and JAX-RS.
- Built the application using TDD (Test Driven Development) approach and involved in different phases of testing like Unit Testing. Responsible for fixing bugs based on the test results.
- Involved in SQL statements, stored procedures, handled SQL Injections and persisted data using Hibernate Sessions, Transactions and Session Factory Objects.
- Responsible for Hibernate Configuration and integrated Hibernate framework.
- Analyzed and fixed the bugs reported in QTP and effectively delivered the bug fixes reported with a quick turnaround time.
- Extensively used Java Collections API like Lists, Sets and Maps.
- Use PVCS for version control and deploy the application in JBOSS server.
- Used for SharePoint for collaborative work.
- Implemented the function to send and receive AMQP messages on RabbitMQ synchronously and asynchronously, and send JMS message to Apache ActiveMQ on the edge device.
- Used Maven for building, deploying application and creating JPA based entity objects.
- Involved in configuring JMS and JNDI in rational application developer (RAD).