Hadoop/spark Developer Resume
St Louis, MissourI
SUMMARY
- Experience around 8 years in IT industry with complete software development of life cycle (SDLC) which includes business requirements gathering, system analysis & design, data modeling, development, testing and implementation of the projects.
- Experience around 5 years in development, implementation and configuration of the Hadoop ecosystem tools such as HDFS, MapReduce, Hive, Oozie, Sqoop, NiFi, Kafka, Zookeeper, ElasticSearch, Knox, Ranger, Cassandra, HBase, MongoDB, Spark Core, Spark Streaming, Spark Data Frame and Spark MLlib
- Experienced on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
- Created a Roadmap for Data Lake that will stream in data from multiple sources and enable analytics using standard BI analytic tools on the Data Lake
- Imported the data from source HDFS into Spark Data Frame for in - memory data computation to generate the optimized output response and better visualizations.
- Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core also convert RRD to Data Frame
- Performed Join and Union operation on Customers Address data which is present in different tables in the source like HBase
- Experience in migrating several data bases from on premise data center to Cassandra
- Tested Apache TEZ, an extensible framework for building high-performance batch and interactive data processing applications, on Pig and Hive jobs
- Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharing features
- Experienced on different Relational Data Base Management Systems like Teradata, PostgresDB, DB2, Oracle and SQL Server.
- Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries
- Implemented Cluster for NoSQL tools HBase as a part of POC to address HBase
- Experienced with NiFi to automate data movement between different Hadoop systems
- Experienced on cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
- Good understanding on security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure
- Expertise using ETL tool Talend and ETL concepts
- Optimized the configurations of Map Reduce, pig and hive jobs for better performance.
- Involved in developing distributed Enterprise and Web applications using UML, Java/J2EE, Web technologies that include EJB, JSP, Servlets, JMS, JDBC, JPA HTML, XML, Tomcat and spring
- Experience with Testing Map Reduce programs using MRUnit, Junit
- Extensively dealt with Spark Streaming and Apache Kafka to fetch live stream data.
- Experience with message brokers, such as Rabbit MQ and Kafka
TECHNICAL SKILLS
Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase, Apache Pig, Hive, Sqoop, Apache Impala, Oozie, Yarn, Apache Flume, Kafka, Zookeeper
OS: RedHat Linux, UNIX, Ubuntu, CentOS, Windows 7/8/XP
Programming Languages: Java, Scala, Python, SQL, PL/SQL, Shell Scripting, JSP, Servlets
Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data LakeData Factory
IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven
Database: Oracle, SQL server, Teradata NoSQL HBase 1.4, Cassandra 3.11, MongoDB, Accumulo
Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX
Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere
Security: LDAP, AD, Kerberos, Apace Knox
PROFESSIONAL EXPERIENCE
Confidential - St. Louis, Missouri
Hadoop/Spark Developer
Responsibilities:
- Responsible for designing and implementing the data pipeline using Big Data tools including Hive, Oozie, Spark, Sqoop, Kafka, and EC2, S3 and EMR
- Loaded and transformed large sets of structured and semi-structured data using Spark SQL and Data Frames API into Spark clusters.
- Responsible in performing sort, join, aggregations & filter transformations.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
- Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
- Used Sqoop to extract and load incremental and non-incremental data from RDBMS systems into Hadoop
- Involved in converting the JSON data into Data Frame and stored into hive tables
- Experienced in transferring data from different data sources into HDFS systems using the Kafka producers, consumers and Kafka brokers
- Wrote Spark-Streaming applications to consume the data from Kafka topics and wrote processed streams to HBase and steamed data using Spark with Kafka
- Create multiple groups and set permission polices for various groups in AWS
- Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data
- Writing a JMS application which connects to Message queue and gets the data to a Message Listener
- Using Kafka Service sending XML messages to Kafka topic using a Kafka producer by providing Kafka Bootstrap Server Configurations
- Writing Java Junit Test Cases for Kafka Producer
- Getting Incremental Data from Hive and HBase on Daily Basis
- Created hive plugin in Apache Drill and exposed it to BI Tool for low latency queries for business users
- Used parquet file format with snappy compression and solved hive small files problem by using merge files, and merge mapred files parameters in hive
- Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution
- Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing the HiveQL queries
- Converted some existing sqoop, hive jobs to Spark SQL applications to read data from Oracle using JDBC and write it to hive tables
- Analyzed the SQL scripts and designed the solution to implement using Scala Spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and developed shell scripts for removal of orphan partitions for hive tables, and archive retention in HDFS
- Explored Spark for improving the performance and optimization of the existing jobs in Hadoop using Spark context, Spark-SQL, Spark Streaming, Data Frame and pair RDD's
- Real time predictive analytics capabilities using Spark Streaming, Spark SQL
- Validating the fact table data which is migrated on daily load basis.
- Used AWS EMR (Elastic Map Reduce) for resource intensive transformation jobs
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in Oozie.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables
Environment: Hive, Spark, S3, AWS, SQL, DB2, Tableau, Git, Zookeeper, YARN, Unix shell scripting, Hbase, Elastic - MapReduce, Spark Streaming, Spark, Scala, Apache Oozie
Confidential, Carrollton, TX
Hadoop Developer
Responsibilities:
- Used Kafka and Kafka brokers as we initiated spark context and processed live streaming information with the help of RDD as is.
- Storing schema of incoming data sources in Schema registry of Kafka which will be utilized by the downstream applications.
- Implemented Hortonworks Nifi (HDP 2.4) and recommended a solution to inject data from multiple data sources to HDFS and Hive using Nifi
- Monitoring all the Nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data
- Multiple Spark Jobs were written to perform Data Quality checks on data before files were moved to Data Processing Layer
- Responsible for loading Data pipelines from webservers using Sqoop with Kafka and Spark Streaming API.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume1.7.0.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Developed shell scripts for dynamic partitions adding to hive stage table, verifying Json schema change of source files, and verifying duplicate files in source location
- Involved in deploying the applications in AWS and maintains the EC2 (Elastic Computing Cloud) and RDS (Relational Database Services) in amazon web services.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and using Sqoop
- Involved in creating Hive tables, loading with data and writing hive queries
- Model and Create the Consolidated Cassandra, FiloDB and Spark tables based on the data profiling
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in Cassandra.
- Developed multiple MapReduce jobs in Java for data cleaning and pre-processing
- Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames
- Developed Spark SQL queries for generating statistical summary and filtering/aggregation operations for specific use cases working with Spark RDD's on distributed cluster running Apache Spark
- Installed, configured and managed Cassandra database and performed read / writes using Java JDBC connectivity.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala
- Read Hive data to Spark Data frames and applied Data Processing and Data Transformation Techniques on it as per the business requirement
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Oozie, Sqoop, Kafka, Flume, Oracle 11g, Core Java, FiloDB, Spark, Scala, Hartonworks, HDFS, Eclipse, Oozie, Unix/Linux, Aws, Zookeeper.
Confidential - Jessup, PA
Hadoop/Java Developer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
- Monitored workload, job performance and capacity planning using Cloudera Manager
- Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS
- Used Apache Solr for search applications
- POC for enabling member and suspect search using Solr
- Worked on Impala for creating views for business use-case requirements on top of the Hive tables
- Migrated existing SQL queries to HiveQL queries to move to big data analytical platform
- Integrated Cassandra file system to Hadoop using Map Reduce to perform analytics on Cassandra data
- Implemented Real time analytics on Cassandra data using thrift API
- Analysed and determined the relationship of input keys to output keys in terms of both type and number, identify the number, type, and value of emitted keys and values from the Mappers, Reducers and the number and contents of the output files.
- Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
- Involved in loading data from UNIX file system to HDFS
- Load and transform large sets data into HDFS using Hadoop fs commands.
- Developed the configuration files and the class's specific to the spring and hibernate
- Utilized Spring framework for bean wiring & Dependency injection principles
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
- Implemented UDFS, UDAFS in java and python for hive to process the data that can't be performed using Hive inbuilt functions.
- Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate report.
- Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
Environment: Java, HDFS, Cassandra, Map Reduce, Sqoop, JUnit, HTML, JavaScript, Hibernate, Spring, Pig, Hive, Tomcat servers, Web Sphere application server.
Confidential - Miami, FL
Java Developer
Responsibilities:
- Developed the presentation tier using Spring Framework that ensures MVC rapid application development
- Used Spring Framework for Dependency injection and integrated with the JSF Framework and Hibernate
- Developed stored procedures, triggers for efficient interaction with MySQL.
- Implemented Ajax calls using JSF-Ajax integration and implemented cross-domain calls using jQuery Ajax methods
- Worked with Swing and RCP using Oracle ADF to develop a search application which is a migration project.
- Used Junit for unit testing and SOAPUI for web service testing
- Used Spring JDBC to interact with DB2 database.
- Used MAVEN as project build tool and GIT for version control
- Participated in code reviews after every sprint.
- The APIs are designed using SPRING Restful web services and followed MVC architecture
- Used two way SSL for authentication and used HMAC and SHA1 for security
- Used Spring JDBC to interact with DB2 mainframe database
- Involved in Integration testing, defect tracking and defect fixing and part of system test
- Used Sping with Quartz Scheduler API to run the batch jobs.
- Used multithreading for asynchronous execution of part of flow
- Good understanding of Federation and Federated applications and MFA.
- Used WAS 7.1 for deploying of applications and Maven as build tool
- Implemented Oracle database access through drivers
- Developed SQL stored procedures for usage within message flows
- Implemented J2EE Design Patterns like MVC, DAO and Singleton
- Designed and Implemented Automated black box testing scripts using Perl and shell scripts
Environment: J2EE, Java, Spring Framework, JDBC, JUNIT, JSON, Servlets, Spring Restful web services, JAXB, WAS 7.1, DB2, PL/SQL, MAVEN, Rally, WINDOWS 7,Quartz Scheduler,Log4j,Splunk,Unix shell scripting, Putty, Servlets, JUNIT, Servlets.
Confidential
Java/J2EE Developer
Responsibilities:
- Good understanding of Federation and Federated applications and MFA.
- Involved in Daily Scrum (Agile) meetings, Sprint planning and estimation of the tasks for the user stories, participated in retrospective and presenting Demo at end of the sprint.
- Involved in the front end using JSP, HTML, CSS, JavaScript and JQuery.
- Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
- Implemented MVC architecture using JSP, Spring, Hibernate and used Spring Framework to initialize managed beans and services.
- Involved in development of Agent Verification System using Spring MVC framework.
- Used Spring Inheritance to develop beans from already developed parent beans.
- Used Spring AOP for logging, auditing and transaction management to distinguish business logic from the cross-cutting concerns
- Used Spring Security for Authentication and Authorization of the application.
- Created data model and generated Hibernate mappings and domain objects
- Interfaced with the MySQL back-end database by integrating Spring with Hibernate.
- Extensively used hibernate named queries, criteria query, Hibernate Query Language (HQL) and Optimistic Locking and Caching to process the data from the database.
- Developed Unit /Integration test cases using JUnit.
- Used Gradle tool for building and deploying the Web applications in Web Logic Server
- Implemented Log4j by enabling logging during runtime without modifying the application binary.
- Used JIRA tool for tracking stories progress and follow agile methodology.
- Used logging techniques provided by Log4J tool for efficient logging and debugging.
- Developed the application using Eclipse as the IDE and used its features for editing, debugging, compiling, formatting, build automation and version control (SVN)
- Involved in Maintenance and Enhancements for the project.
- Involved in bug fixing and resolving issues with the QA and production environment during production support
Environment: Java, J2EE, HTML, CSS, JavaScript, JQuery, Struts, Spring IOC, Spring MVC, Spring AOP, JDBC, Hibernate, My SQL, HQL, SQL, JUnit, Gradle, JIRA, Log4J, Eclipse, SVN and Web Logic Server.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in the Software Development Life Cycle (SDLC) including Requirement Analysis, Design, Implementation, Testing and Maintenance.
- Involved in check in java versioning into SCM repository.
- Used UML modelling diagrams to describe the structure and behaviour of the system
- Utilized pair programming approach to ensure high quality code.
- Applied design patterns including MVC Pattern & Abstract Factory Pattern
- Developed Java Classes for implementation of Persistence of objects and Caching of Data using JDBC.
- Used JSP, JS, JQuery, Servlets, EJB, CSS, Struts
- Developed user interfaces using JSPs, Struts, HTML, CSS, JavaScript, JSP Custom Tags
- Used Connection Pooling to get JDBC connection and access database procedures.
- Involved in Unit Testing, Integration Testing, and System Testing, used in JUnit framework
- Converted the mock-ups into hand-written HTML, CSS 2, XHTML, JavaScript, JQUERY, XML and JSON.
- Deployed the application in JBoss Application Server
- Used SVN for version control and Log4J to store log messages
Environment: Java 1.5, J2EE, JSP, Struts, JavaScript, JBoss, AJAX, HTML, CSS, JDBC, Eclipse, Restful Web Services, AngularJS, WSDL, Windows, JSF, SOA, JSON, Design patterns, JUnit, JQuery, SOAP.