We provide IT Staff Augmentation Services!

Big Data Architect / Data Scientist Resume

2.00/5 (Submit Your Rating)

Jacksonville, FL

SUMMARY

  • More than 17 years of professional IT experience with 4+ years in Data Science, 7+ years in Big Data Analytics and 8+ years of experience in analysis, architectural design, prototyping, development, Integration and testing of applications.
  • Experience working on Python (pandas, numpy, scikit - learn, seaborn, plotly), Opencv, PyTorch, Keras, Tensorflow, PySpark, SAP-BDS, Hadoop, AWS(EC2), R (caret, rpart, ggplot, dplyr), Spark(MLlib), AzureML, Sage Maker, ML Ops
  • Strong Knowledge and work experience using Machine Learning: Regression(Linear, Multivariate, Lasso, Ridge), Classification(Logistic regression, Random forest, KNN, Naive bayes), Clustering(K-means, Hierarchical), SVM, Neural Network, Boosting, NLP(nltk)
  • Strong knowledge Apache Spark, Scala, Python, R and MLOps
  • Worked on the Spark Core, Spark SQL and Spark Streaming modules of Spark extensively.
  • Strong understanding and coding experience using Scala, Python, R
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experienced on major Hadoop ecosystem’s projects such as PIG, HIVE and HBASE.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa
  • Good knowledge in using job scheduling and monitoring tools like Oozie and ZooKeeper
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Hortonworks
  • Developed UML Diagrams for Object Oriented Design: Use Cases, Sequence Diagrams ad Class Diagrams using Rational Rose,Visual Paradigm and Visio
  • Working knowledge of database such as Oracle, My SQL, MS SQL, Teradata and NoSQL
  • Strong experience in database design, writing complex SQL Queries and Stored Procedures
  • Experience in Building, Deploying and Integrating with Ant, Maven
  • Experience in development of logging standards and mechanism based on Log4J
  • Strong work ethic with desire to succeed and make significant contributions to the organization
  • Strong problem solving skills, good communication, interpersonal skills and a good team player
  • Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member

TECHNICAL SKILLS

Data Science: Pandas, Numpy, Scikit-Learn, seaborn, plotly, R (caret, rpart, ggplot, dplyr), MLLib, PyTorch, Keran, Tensonflow

Opencv, Deep Learning, MLOps: Regression: Linear, Multivariate, Lasso, Ridge, Classification: Logistic regression, Random forest, KNN, Naive bayes, Clustering: K-means, Hierarchical, SVM, Neural Network, Boosting, NLP(nltk)

Hadoop/Big Data Technologies: Apache Spark, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, HBase, Oozie, Zookeeper, Scala, Python, Apache Kafka, Apache Storm

Programming Languages: Python, Scala, R, Java, C/C++, HTML, SQL, PL/SQL, AVS & JVS

Operating Systems: UNIX, Windows, LINUX

Application Servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere

Web technologies: JSP, Servlets, JNDI, JDBC, Java Beans, JavaScript, Web Services

Databases: Oracle, MySQL, MS SQL, Teradata

Java IDE: Eclipse, IBM Web Sphere Application Developer, IBM RAD 7.0

Tools: TOAD, SQL Developer, SOAP UI, ANT, Maven, Endur 8.x/10.x/11.x

PROFESSIONAL EXPERIENCE

Big Data Architect / Data Scientist

Confidential, Jacksonville, FL

Environment: Hadoop, Yarn, Spark Core, Spark SQL, Data Science, Python, R, Scala, HDFS, Hive, Pig, Java, SQL, Hortonworks HDP 2.5, Sqoop, Oracle, MySQL, Tableau, Elastic search, Oozie, Kafka, Flume, Eclipse, AWS, EC2, EMR, Athena, Numpy, Pandas, Scikit-Learn

Responsibilities:

  • Develop and maintain our Big Data pipeline that transfers and process data using Apache Spark.
  • Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Implemented data cleansing and validating process using python and scala.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Migrating the needed data from DB2, Oracle, MySQL into HDFS in using Sqoop and importing various formats of flat files in to HDFS.
  • Design Batch ingestion components using Sqoop scripts, data integration and processing components using shell scripts, hive scripts.
  • Worked in Agile development approach
  • Created the estimates and defined the sprint stages.
  • Developed a strategy for Full load and incremental load using Sqoop.
  • Mainly worked on Hive queries to categorize data of different claims.
  • Written customized Hive UDFs in Java where the functionality is too complex.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Defining workflow and scheduling all the processes involved using Oozie
  • Exported the analyzed data to the relational databases using hive for visualization and to generate reports for the BI team.
  • Involved in implementing Data Science and Machine Learning models using Numpy, Pandas, Scikit-Learn, PyTorch, Keras and Tensonflow

Hadoop Architect / Lead

Confidential, Bellevue, WA

Environment: Hadoop, MapReduce V2 yarn, HDFS, Hive, Pig, Java, SQL, Hortonworks HDP 2.3, Sqoop, Oracle, MySQL, Tableau, Talend, Elastic search, Oozie, Spark Core, Spark SQL, Spark Streaming, Kafka, Flume, Eclipse

Responsibilities:

  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Strong knowledge and hands-on experience in Talend.
  • Migrating the needed data from Oracle, MySQL in to HDFS in using Sqoop and importing various formats of flat files in to HDFS.
  • Design Batch ingestion components using Sqoop scripts, data integration and processing components using shell scripts, pig scripts, hive scripts.
  • Proposed an automated system using Shell script to sqoop the job.
  • Worked with Flume from Server to HDFS
  • Developed a data pipeline for data processing using Spark SQL API.
  • Created the estimates and defined the sprint stages.
  • Developed a strategy for Full load and incremental load using Sqoop.
  • Mainly worked on Hive queries to categorize data of different claims.
  • Integrated the hive warehouse with HBase
  • Written customized Hive UDFs in Java where the functionality is too complex.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase and Hive).
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Defining workflow and scheduling all the processes involved using Oozie
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Big Data components.
  • Consumed the data from Kafka using Apache spark.
  • Developed data pipeline usingKafkato store data into HDFS.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Implemented Spark streaming framework that processes the data for Kafka and perform analytics on top of it.
  • Developed a data pipeline for data processing using Spark SQL API.
  • Extracted Real time feed using Kafka andSpark Streamingand convert it to RDD and process data in the form ofData Frameand save the data as Parquet/ORC format inHDFS.
  • Worked towards creating real time data streaming solutions using ApacheSpark/Spark Streaming,Kafka.

Hadoop Senior Developer

Confidential, Oakbrook, IL

Environment: Hadoop, MapReduce, HDFS, Amazon EC2, CentOS, Hive, Pig, Java, SQL, Hontronworks, Sqoop, Oozie, Java (jdk 1.6), Eclipse, Spark Streaming

Responsibilities:

  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed simple and complex MapReduce programs in Java for Data Analysis.
  • Worked extensively on Hive and PIG.
  • Worked on large sets of structured, semi-structured and unstructured data.
  • Developed PIG Latin scripts to play with the data.
  • Used Jupyter notebooks for interactive data processing and Visualization.
  • Designed and developed an end to end IIS log analysis using HD insight Hive and Power BI technology.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Hands on experience in loading data from UNIX file system to HDFS.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Involved in creating Hive tables, loading data and running hive queries in those data.
  • Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
  • Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Responsible for building scalable distributed data solutions using Hadoop
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behaviour
  • Working knowledge in writing Pig’s Load and Store functions

Java/Endur Lead

Confidential

Responsibilities:

  • Analyzed the requirement based on Function Specification Document.
  • Implemented Post and Pre processing Scripts, configured in operations manager
  • Implemented Param and Main Scripts, prepared Adhoc & Scheduled tasks.
  • Used Market Manager and Operations Manager
  • Used Admin Manager and Reference Manager
  • Responsible for Analysis and Design of specifications
  • Development and functional testing on Endur V10 for change requests for existing functionality.
  • Code merge and functional testing for Endur V10
  • Worked on migration project named UK Assets migration

Java & J2EE developer

Confidential

Responsibilities:

  • Involved in development of business domain concepts into Use Cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams.
  • Implemented various J2EE Design Patterns such as Model-View-Controller, Data Access Object, Business Delegate and Transfer Object.
  • Responsible for analysis and design of the application based on MVC Architecture, using open source Struts Framework.
  • Involved in configuring Struts, Tiles and developing the configuration files.
  • Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
  • Developed and deployed UI layer logics using JSP, XML, JavaScript, HTML /DHTML.
  • Used Spring Framework and integrated it with Struts.
  • Involved in Configuring web.xml and struts-config.xml according to the struts framework.
  • Designed a light weight model for the product using Inversion of Control principle and implemented it successfully using Spring IOC Container.
  • Used transaction interceptor provided by Spring for declarative Transaction Management.
  • The dependencies between the classes were managed by Spring using the Dependency Injection to promote loose coupling between them.
  • Provided connections using JDBC to the database and developed SQL queries to manipulate the data.
  • Developed DAO using spring JDBC Template to run performance intensive queries.
  • Developed ANT script for auto generation and deployment of the web service.
  • Wrote stored procedure and used JAVA APIs to call these procedures.
  • Developed various test cases such as unit tests, mock tests, and integration tests using the JUNIT.
  • Experience writing Stored Procedures, Functions and Packages .
  • Used log4j to perform logging in the applications.

Java & J2EE developer

Confidential

Environment: Java, JDK 1.5, Servlets, Hibernate, Ajax, Oracle 10g, Eclipse, Apache Ant, Web Services (SOAP), Apache Axis, Apache Ant, Web Logic Server, JavaScript, HTML, CSS, XML

Responsibilities:

  • Responsible for gathering and analyzing requirements and converting them into technical specifications
  • Used Rational Rose for creating sequence and class diagrams
  • Developed presentation layer using JSP, Java, HTML and JavaScript
  • Used Spring Core Annotations for Dependency Injection
  • Designed and developed a ‘Convention Based Coding’ utilizing Hibernate’s persistence framework and O-R mapping capability to enable dynamic fetching and displaying of various table data with JSF tag libraries
  • Designed and developed Hibernate configuration and session-per-request design pattern for making database connectivity and accessing the session for database transactions respectively. Used HQL and SQL for fetching and storing data in databases
  • Participated in the design and development of database schema and Entity-Relationship diagrams of the backend Oracle database tables for the application
  • Implemented web services with Apache Axis
  • Designed and Developed Stored Procedures, Triggers in Oracle to cater the needs for the entire application. Developed complex SQL queries for extracting data from the database
  • Designed and built SOAP web service interfaces implemented in Java
  • Used Apache Ant for the build process

Java & J2EE developer

Confidential

Environment: Java, J2EE, XML, XML Schemas, JSP, HTML, CSS, PL/SQL, Junit, Log4j, IBM Web sphere Application Server.

Responsibilities:

  • Involved in creation of UML diagrams like Class, Activity, and Sequence Diagrams using modeling tools of IBM Rational Rose
  • Involved in the development of JSPs and Servlets for different User Interfaces
  • Used Struts action forms and developed Action Classes, which act as the navigation controller in Struts framework
  • Implemented the template-based categorization of presentation content using Struts-Tiles. MVC implementation using Struts framework
  • Involved in Unit Testing of Various Modules based on the Test Cases
  • Involved in Bug fixing of various modules that were raised by the Testing teams in the application during the Integration testing phase
  • Involved and participated in Code reviews
  • Used Log4J logging framework for logging messages
  • Used Rational ClearCase for version control
  • Used Rational Clear Quest for bug tracking
  • Involved in deployment of application on IBM Websphere Application Server

We'd love your feedback!