Hadoop/Spark Developer Resume St. Louis, Missouri - Hire IT People

SUMMARY

Experience around 8 years in IT industry with complete software development of life cycle (SDLC) which includes business requirements gathering, system analysis & design, data modeling, development, testing and implementation of the projects.
Experience around 5 years in development, implementation and configuration of the Hadoop ecosystem tools such as HDFS, MapReduce, Hive, Oozie, Sqoop, NiFi, Kafka, Zookeeper, ElasticSearch, Knox, Ranger, Cassandra, HBase, MongoDB, Spark Core, Spark Streaming, Spark Data Frame and Spark MLlib
Experienced on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
Created a Roadmap for Data Lake that will stream in data from multiple sources and enable analytics using standard BI analytic tools on the Data Lake
Imported the data from source HDFS into Spark Data Frame for in - memory data computation to generate the optimized output response and better visualizations.
Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core also convert RRD to Data Frame
Performed Join and Union operation on Customers Address data which is present in different tables in the source like HBase
Experience in migrating several data bases from on premise data center to Cassandra
Tested Apache TEZ, an extensible framework for building high-performance batch and interactive data processing applications, on Pig and Hive jobs
Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau.
Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharing features
Experienced on different Relational Data Base Management Systems like Teradata, PostgresDB, DB2, Oracle and SQL Server.
Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries
Implemented Cluster for NoSQL tools HBase as a part of POC to address HBase
Experienced with NiFi to automate data movement between different Hadoop systems
Experienced on cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
Good understanding on security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure
Expertise using ETL tool Talend and ETL concepts
Optimized the configurations of Map Reduce, pig and hive jobs for better performance.
Involved in developing distributed Enterprise and Web applications using UML, Java/J2EE, Web technologies that include EJB, JSP, Servlets, JMS, JDBC, JPA HTML, XML, Tomcat and spring
Experience with Testing Map Reduce programs using MRUnit, Junit
Extensively dealt with Spark Streaming and Apache Kafka to fetch live stream data.
Experience with message brokers, such as Rabbit MQ and Kafka

TECHNICAL SKILLS

Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase, Apache Pig, Hive, Sqoop, Apache Impala, Oozie, Yarn, Apache Flume, Kafka, Zookeeper

OS: RedHat Linux, UNIX, Ubuntu, CentOS, Windows 7/8/XP

Programming Languages: Java, Scala, Python, SQL, PL/SQL, Shell Scripting, JSP, Servlets

Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data LakeData Factory

IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven

Database: Oracle, SQL server, Teradata NoSQL HBase 1.4, Cassandra 3.11, MongoDB, Accumulo

Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX

Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere

Security: LDAP, AD, Kerberos, Apace Knox

PROFESSIONAL EXPERIENCE

Confidential - St. Louis, Missouri

Hadoop/Spark Developer

Responsibilities:

Responsible for designing and implementing the data pipeline using Big Data tools including Hive, Oozie, Spark, Sqoop, Kafka, and EC2, S3 and EMR
Loaded and transformed large sets of structured and semi-structured data using Spark SQL and Data Frames API into Spark clusters.
Responsible in performing sort, join, aggregations & filter transformations.
Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
Used Sqoop to extract and load incremental and non-incremental data from RDBMS systems into Hadoop
Involved in converting the JSON data into Data Frame and stored into hive tables
Experienced in transferring data from different data sources into HDFS systems using the Kafka producers, consumers and Kafka brokers
Wrote Spark-Streaming applications to consume the data from Kafka topics and wrote processed streams to HBase and steamed data using Spark with Kafka
Create multiple groups and set permission polices for various groups in AWS
Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data
Writing a JMS application which connects to Message queue and gets the data to a Message Listener
Using Kafka Service sending XML messages to Kafka topic using a Kafka producer by providing Kafka Bootstrap Server Configurations
Writing Java Junit Test Cases for Kafka Producer
Getting Incremental Data from Hive and HBase on Daily Basis
Created hive plugin in Apache Drill and exposed it to BI Tool for low latency queries for business users
Used parquet file format with snappy compression and solved hive small files problem by using merge files, and merge mapred files parameters in hive
Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution
Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing the HiveQL queries
Converted some existing sqoop, hive jobs to Spark SQL applications to read data from Oracle using JDBC and write it to hive tables
Analyzed the SQL scripts and designed the solution to implement using Scala Spark.
Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and developed shell scripts for removal of orphan partitions for hive tables, and archive retention in HDFS
Explored Spark for improving the performance and optimization of the existing jobs in Hadoop using Spark context, Spark-SQL, Spark Streaming, Data Frame and pair RDD's
Real time predictive analytics capabilities using Spark Streaming, Spark SQL
Validating the fact table data which is migrated on daily load basis.
Used AWS EMR (Elastic Map Reduce) for resource intensive transformation jobs
Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in Oozie.
Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables

Environment: Hive, Spark, S3, AWS, SQL, DB2, Tableau, Git, Zookeeper, YARN, Unix shell scripting, Hbase, Elastic - MapReduce, Spark Streaming, Spark, Scala, Apache Oozie

Confidential, Carrollton, TX

Hadoop Developer

Responsibilities:

Used Kafka and Kafka brokers as we initiated spark context and processed live streaming information with the help of RDD as is.
Storing schema of incoming data sources in Schema registry of Kafka which will be utilized by the downstream applications.
Implemented Hortonworks Nifi (HDP 2.4) and recommended a solution to inject data from multiple data sources to HDFS and Hive using Nifi
Monitoring all the Nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
Implemented Spark using Scala and Spark SQL for faster testing and processing of data
Multiple Spark Jobs were written to perform Data Quality checks on data before files were moved to Data Processing Layer
Responsible for loading Data pipelines from webservers using Sqoop with Kafka and Spark Streaming API.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume1.7.0.
Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
Developed shell scripts for dynamic partitions adding to hive stage table, verifying Json schema change of source files, and verifying duplicate files in source location
Involved in deploying the applications in AWS and maintains the EC2 (Elastic Computing Cloud) and RDS (Relational Database Services) in amazon web services.
Importing and exporting data from different databases like MySQL, RDBMS into HDFS and using Sqoop
Involved in creating Hive tables, loading with data and writing hive queries
Model and Create the Consolidated Cassandra, FiloDB and Spark tables based on the data profiling
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in Cassandra.
Developed multiple MapReduce jobs in Java for data cleaning and pre-processing
Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames
Developed Spark SQL queries for generating statistical summary and filtering/aggregation operations for specific use cases working with Spark RDD's on distributed cluster running Apache Spark
Installed, configured and managed Cassandra database and performed read / writes using Java JDBC connectivity.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala
Read Hive data to Spark Data frames and applied Data Processing and Data Transformation Techniques on it as per the business requirement

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Oozie, Sqoop, Kafka, Flume, Oracle 11g, Core Java, FiloDB, Spark, Scala, Hartonworks, HDFS, Eclipse, Oozie, Unix/Linux, Aws, Zookeeper.

Confidential - Jessup, PA

Hadoop/Java Developer

Responsibilities:

Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
Monitored workload, job performance and capacity planning using Cloudera Manager
Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS
Used Apache Solr for search applications
POC for enabling member and suspect search using Solr
Worked on Impala for creating views for business use-case requirements on top of the Hive tables
Migrated existing SQL queries to HiveQL queries to move to big data analytical platform
Integrated Cassandra file system to Hadoop using Map Reduce to perform analytics on Cassandra data
Implemented Real time analytics on Cassandra data using thrift API
Analysed and determined the relationship of input keys to output keys in terms of both type and number, identify the number, type, and value of emitted keys and values from the Mappers, Reducers and the number and contents of the output files.
Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
Involved in loading data from UNIX file system to HDFS
Load and transform large sets data into HDFS using Hadoop fs commands.
Developed the configuration files and the class's specific to the spring and hibernate
Utilized Spring framework for bean wiring & Dependency injection principles
Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
Implemented UDFS, UDAFS in java and python for hive to process the data that can't be performed using Hive inbuilt functions.
Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate report.
Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts

Environment: Java, HDFS, Cassandra, Map Reduce, Sqoop, JUnit, HTML, JavaScript, Hibernate, Spring, Pig, Hive, Tomcat servers, Web Sphere application server.

Confidential - Miami, FL

Java Developer

Responsibilities:

Developed the presentation tier using Spring Framework that ensures MVC rapid application development
Used Spring Framework for Dependency injection and integrated with the JSF Framework and Hibernate
Developed stored procedures, triggers for efficient interaction with MySQL.
Implemented Ajax calls using JSF-Ajax integration and implemented cross-domain calls using jQuery Ajax methods
Worked with Swing and RCP using Oracle ADF to develop a search application which is a migration project.
Used Junit for unit testing and SOAPUI for web service testing
Used Spring JDBC to interact with DB2 database.
Used MAVEN as project build tool and GIT for version control
Participated in code reviews after every sprint.
The APIs are designed using SPRING Restful web services and followed MVC architecture
Used two way SSL for authentication and used HMAC and SHA1 for security
Used Spring JDBC to interact with DB2 mainframe database
Involved in Integration testing, defect tracking and defect fixing and part of system test
Used Sping with Quartz Scheduler API to run the batch jobs.
Used multithreading for asynchronous execution of part of flow
Good understanding of Federation and Federated applications and MFA.
Used WAS 7.1 for deploying of applications and Maven as build tool
Implemented Oracle database access through drivers
Developed SQL stored procedures for usage within message flows
Implemented J2EE Design Patterns like MVC, DAO and Singleton
Designed and Implemented Automated black box testing scripts using Perl and shell scripts

Environment: J2EE, Java, Spring Framework, JDBC, JUNIT, JSON, Servlets, Spring Restful web services, JAXB, WAS 7.1, DB2, PL/SQL, MAVEN, Rally, WINDOWS 7,Quartz Scheduler,Log4j,Splunk,Unix shell scripting, Putty, Servlets, JUNIT, Servlets.

Confidential

Java/J2EE Developer

Responsibilities:

Good understanding of Federation and Federated applications and MFA.
Involved in Daily Scrum (Agile) meetings, Sprint planning and estimation of the tasks for the user stories, participated in retrospective and presenting Demo at end of the sprint.
Involved in the front end using JSP, HTML, CSS, JavaScript and JQuery.
Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
Implemented MVC architecture using JSP, Spring, Hibernate and used Spring Framework to initialize managed beans and services.
Involved in development of Agent Verification System using Spring MVC framework.
Used Spring Inheritance to develop beans from already developed parent beans.
Used Spring AOP for logging, auditing and transaction management to distinguish business logic from the cross-cutting concerns
Used Spring Security for Authentication and Authorization of the application.
Created data model and generated Hibernate mappings and domain objects
Interfaced with the MySQL back-end database by integrating Spring with Hibernate.
Extensively used hibernate named queries, criteria query, Hibernate Query Language (HQL) and Optimistic Locking and Caching to process the data from the database.
Developed Unit /Integration test cases using JUnit.
Used Gradle tool for building and deploying the Web applications in Web Logic Server
Implemented Log4j by enabling logging during runtime without modifying the application binary.
Used JIRA tool for tracking stories progress and follow agile methodology.
Used logging techniques provided by Log4J tool for efficient logging and debugging.
Developed the application using Eclipse as the IDE and used its features for editing, debugging, compiling, formatting, build automation and version control (SVN)
Involved in Maintenance and Enhancements for the project.
Involved in bug fixing and resolving issues with the QA and production environment during production support

Environment: Java, J2EE, HTML, CSS, JavaScript, JQuery, Struts, Spring IOC, Spring MVC, Spring AOP, JDBC, Hibernate, My SQL, HQL, SQL, JUnit, Gradle, JIRA, Log4J, Eclipse, SVN and Web Logic Server.

Confidential

Java/J2EE Developer

Responsibilities:

Involved in the Software Development Life Cycle (SDLC) including Requirement Analysis, Design, Implementation, Testing and Maintenance.
Involved in check in java versioning into SCM repository.
Used UML modelling diagrams to describe the structure and behaviour of the system
Utilized pair programming approach to ensure high quality code.
Applied design patterns including MVC Pattern & Abstract Factory Pattern
Developed Java Classes for implementation of Persistence of objects and Caching of Data using JDBC.
Used JSP, JS, JQuery, Servlets, EJB, CSS, Struts
Developed user interfaces using JSPs, Struts, HTML, CSS, JavaScript, JSP Custom Tags
Used Connection Pooling to get JDBC connection and access database procedures.
Involved in Unit Testing, Integration Testing, and System Testing, used in JUnit framework
Converted the mock-ups into hand-written HTML, CSS 2, XHTML, JavaScript, JQUERY, XML and JSON.
Deployed the application in JBoss Application Server
Used SVN for version control and Log4J to store log messages

Environment: Java 1.5, J2EE, JSP, Struts, JavaScript, JBoss, AJAX, HTML, CSS, JDBC, Eclipse, Restful Web Services, AngularJS, WSDL, Windows, JSF, SOA, JSON, Design patterns, JUnit, JQuery, SOAP.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

St Louis, MissourI

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship