We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

Bloomington, IL

SUMMARY

  • 8+ years of IT experience as Big Data/Hadoop Developer in all phases of Software Development Life Cycle which includes hands on experience in Java/J2EE Technologies and Big Data.
  • Experience in Apache Hadoop ecosystem components like HDFS, Map Reduce, Spark, Yarn, Kafka, Pig, Hive, Impala, HBase, SQOOP, Flume, Oozie, Accumulo.
  • Well versed experience in Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context. Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
  • Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
  • Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Proficient in Core Java, Enterprise technologies such as etc.
  • Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
  • Written multiple MapReduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file EJB, Hibernate, Java Web Service, SOAP, REST Services, Java Thread, Java Socket, Java Servlet, JSP, JDBC formats.
  • Experience in working on the Hadoop Eco system, also have little experience on installing and configuring of the Hortonworks distribution and Cloudera distribution (CDH3 and CDH4).
  • Experience in NoSQL database HBase, MongoDB and Cassandra.
  • Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming.
  • Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
  • Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
  • Extensively worked with different data sources non-relational databases such as XML files, parses like SAX, DOM and other relational databases such as Oracle, MySQL.
  • Experience working on Application servers like IBM WebSphere, JBoss, BEA WebLogic and Apache Tomcat.
  • Extensive experience in Internet, client/server technologies using Java, J2EE, Struts, Hibernate, Spring, HTML, HTML5, DHTML, CSS, JavaScript, XML, PERL.
  • Expert in deploying the code trough web application servers like Web Sphere/Web Logic/ Apache Tomcat in AWS CLOUD.
  • Expertise in core Java, J2EE, Multithreading, JDBC, Hibernate, Shell Scripting Servlets, JSP, Spring, Struts, EJBs, Web Services, XML, JPA, JMS, JNDI and proficient in using Java API's for application development
  • Good working experience in Application and web Servers like JBoss and Apache Tomcat.
  • Experience in writing Pig and Hive scripts and extending the core functionality by writing custom UDF's.
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Extensive experience with Agile Development, Object Modeling using UML and Rational Unified Process (RUP).
  • Strong knowledge of Object Oriented Programming (OOP) concepts including the use of Polymorphism, Abstraction, Inheritance and Encapsulation.

TECHNICAL SKILLS

  • Java
  • J2EE
  • Multithreading
  • JDBC
  • Hibernate
  • Shell Scripting Servlets
  • JSP
  • Spring
  • Struts
  • EJBs
  • Web Services
  • XML
  • JPA
  • JMS
  • JNDI
  • HTML
  • HTML5
  • DHTML
  • CSS
  • HDFS
  • Map Reduce
  • Spark
  • Yarn
  • Kafka
  • Pig
  • Hive
  • Impala
  • HBase
  • SQOOP
  • Flume
  • Oozie
  • Accumulo

PROFESSIONAL EXPERIENCE

Confidential, Bloomington, IL

Hadoop/Spark Developer

Responsibilities:

  • Creating end to end Spark-Solr applications using Scala to perform variousdatacleansing, validation, transformation and summarization activities according to the requirement
  • Implemented Moving averages, Interpolations and Regression analysis on inputdata.
  • Tuning spark application to improve performance.Worked collaboratively to manage build outs of largedataclusters and real time streaming with Spark.
  • Worked on creating Hive tables and written Hive queries fordataanalysis to meet business requirements and experienced in Sqoop to import and export thedatafrom Oracle & MySQL.
  • Used Spark for interactive queries, processing of streamingdataand integration with popular NoSQL database for huge volume ofdata.
  • Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
  • Good knowledge on TalendDQ & Data profiling.
  • Troubleshooting, debugging & alteringTalendparticular issues, while maintaining the health and performance of the ETL environment.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL,DataFrame, Pair RDD's, YARN.
  • Implemented Storm to pull feeds from various intranet and extranet B2B sites
  • Constructed and activated muti-node Storm topologies to run on top of Nimbus/Supervisor daemons
  • Worked Spark on Treadmill to deploy a cluster from scratch under couple of minutes.
  • Responsible in handling Streamingdatafrom web server console logs.
  • Developed jobs in Talend Enterprise edition from stage to source, intermediate, conversion and target.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Responsible for gathering the business requirements for the Initial POCs to load the enterprisedata warehousedatato Greenplum databases.
  • Oracle to Greenplum migration - Designed an automation script using PL/SQL procedure to convert the Oracle DDL to Greenplum standard.
  • Extractdatafrom heterogeneous sources like Flat files VSAM, Oracle, SQL server, Greenplum into HDFS using Sqoop.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Design and improve internal search engine usingBigdataand SOLR/Fusion.
  • Datamigration from variousdatasources to SOLR via stages according to the requirement
  • Used Akka as a framework to create reactive, distributed, parallel and resilient concurrent applications in Scala.
  • Extensively worked on Jenkins for continuous integration and for End to End automation for all build and deployments.
  • Work with cross functional consulting teams within thedatascience and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems.
  • Exported the analyzeddatato the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Converted Ant application into Gradle. Onsite-Offshore synchronization. Teams at both the ends should be well connected to have a smooth flow in the project and solve the roadblocks
  • Monitoring the ticketing tool for any tickets indicating an issue/incident reported and resolving with the appropriate fix in the project.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for smalldatasets.
  • Used Cloud watch logs to move app logs to S3. Create alarms based on exceptions raised by applications.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructureddata and analyzed them by running Hive queries.
  • Continuous monitoring and managing the Hadoop/spark cluster using Cloudera Manager
  • Applydatascience and machine learning techniques using Zeppelin to improve search engine in Wealth management firm.
  • Used Talend ETL tool to develop multiple jobs and in setting workflows.
  • Work with Architecture and Development teams to understand usage patterns and work load requirements of new projects to ensure the Hadoop platform can effectively meet performance requirements and service levels of application.

Environment: Java 1.8, Scala 2.10.5, Apache Spark 1.6.0, Apache Zeppelin, GreenPlum 4.3 (PostgreSQL), Treadmill, CDH 5.8.2, Spring 3.0.4, ivy 2.0, Gradle 2.13, Hive, Talend, HDFS, YARN, MapReduce, Sqoop 1.4.3, Flume, SOLR, UNIX Shell Scripting, Python 2.6, AWS, Kafka, Jenkins, Akka

Confidential, Denver, CO

Sr. Hadoop Developer

Responsibilities:

  • Worked on loading disparatedatasets coming from different sources to BDpaas (HADOOP) environment using SQOOP.
  • Developed UNIX scripts in creating Batch load and driver code for bringing huge amount ofdatafrom Relational databases to BIGDATA platform.
  • Ingesteddatafrom one tenant to the other. Developed Pig queries to loaddatato HBase Leveraged Hive queries to create ORC tables
  • Created ORC tables to improve the performance for the reporting purposes.Involved in the coding and integration of several business-critical modules of CARE application using Java, spring, Hibernate and REST web services on Web Sphere application server.
  • Involved in project to provide eligibility, structure and transactional feeds to River Valley Facets platform where heritage and neighborhood health plans and related commercial products are maintained and administered.
  • Developed web pages using JSPs and JSTL to help end user make online submission of rebates. Also used XML Beans fordatamapping of XML into Java Objects.
  • Worked with Systems Analyst and business users to understand requirements for feed generation.
  • Created Health Allies Eligibility and Health Allies Transactional feeds extracts using Hive, HBase, Python and UNIX to migrate feed generation from a mainframe application called CES (Consolidated Eligibility Systems) tobigdata.
  • Used bucketing concepts in Hive to improve performance of HQL queries.
  • Used numerous user defined functions in hive to attain complex business logic in feed generation.
  • Developed Spark scripts by using Scala shell commands.
  • Created reusable Python script and added it to distributed cache in Hive to generate fixed widthdata files using an offset file.
  • Created a MapReduce program which looks intodatain HBase current and prior versions to identify transactional updates. These updates are loaded into Hive external tables which are in turn referred by Hive scripts in transactional feeds generation.
  • Worked on agile methodology using Rally

Environment: MAPR, Sqoop, Hive, Pig, Python, UNIX, HBase, Spark, Rally.

Confidential, Columbus, OH

Java/Hadoop Developer

Responsibilities:

  • Responsible for business logic using java and JavaScript, JDBC for querying database.
  • Involved in requirement analysis, design, coding and implementation.
  • Worked in Agile Methodology and used JIRA for maintain the stories about project.
  • Analyzed largedatasets by running Hive queries.
  • Involved in Design, develop HiveDatamodel, loading withdataand writing Java UDF for Hive.
  • Handled importing and exportingdatainto HDFS by developing solutions, analyzed thedatausing Map Reduce, Hive and produce summary results from Hadoop to downstream systems.
  • Used Sqoop to import and export thedatafrom Hadoop Distributed File System (HDFS) to RDBMS.
  • Created Hive tables and loadeddatafrom HDFS to Hive tables as per the requirement.
  • Established custom Map Reduces programs in order to analyzedataand used HQL queries to clean unwanteddata.
  • Created components like Hive UDFs for missing functionality in Hive to analyze and process the large volumes ofdata.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
  • Involved in writing complex queries to perform join operations between multiple tables.
  • Involved actively verifying and testingdatain HDFS and Hive tables while Sqoopingdatafrom Hive to RDBMS tables.
  • Developing Scripts and Scheduled Autosy's Jobs to filter thedata.
  • Involved monitoring Auto Sys's file watcher jobs and testingdatafor each transaction and verifieddataweather it ran properly or not.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Used IMPALA to pull thedatafrom Hive tables.
  • Used Apache Maven 3.x to build and deploy application to various environmentsInstalled Oozie workflow engine to run multiple Hive jobs which run independently with time anddata availabilities

Environment: HDFS, Hadoop, Pig, Hive, Sqoop, Flume, Map Reduce, Oozie, Mongo DB, Java 6/7, Oracle 10g, Sub Version, Toad, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Agile Methodology, JIRA, Auto Sys

Confidential

Java Developer

Responsibilities:

  • Responsible and active in the analysis, design, implementation and deployment of full software development life-cycle (SDLC) of the project.
  • Designed and developed user interface using JSP, HTML and JavaScript.
  • Developed struts action classes, action forms and performed action mapping using Struts Framework and performed data validation in form beans and action classes.
  • Involved in multi-tiered J2EE design utilizing MVC architecture (Struts Framework) and Hibernate.
  • Extensively used Struts Framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
  • Involved in system design and development in core java using Collections, multithreading.
  • Defined the search criteria and pulled out the record of the customer from the database. Make the required changes to the record and save the updated information back to the database.
  • Wrote JavaScript validations to validate the fields of the user registration screen and login screen.
  • Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
  • Used DAO and JDBC for database access.
  • Developed applications with ANT based build scripts.
  • Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
  • Design and develop XML processing components for dynamic menus on the application.
  • Involved in postproduction support and maintenance of the application.

Environment: Oracle 11g, Java 1.5, Struts 1.2, Servlets, HTML, XML, MS MS SQL Server 2005, J2EE, JUnit, Tomcat 6.

Confidential

SQL/Java Developer

Responsibilities:

  • Involved in database design.
  • Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, Stored procedures, Views in Oracle 10g.
  • Created User Interface using JSP.
  • Involved in integration testing the Business Logic layer and Data Access layer.
  • Used technologies like JSP, JavaScript, HTML, XML for Presentation tier
  • Involved in JUnit testing of the application using JUnit framework.
  • Implemented Stored Procedures functions and views to retrieve the data.
  • Used Rational Application Developer (RAD) as Integrated Development Environment (IDE).
  • Used unit testing for all the components using JUnit.
  • Used Apache log 4j Logging framework for logging of trace and Auditing.
  • Used Asynchronous JavaScript and XML (AJAX) for better and faster interactive Front-End.
  • Used IBM Web-Sphere as the Application Server.
  • Used IBM Rational Clear case as the version controller.
  • Responsible to mentor/work with team members to make sure the standards and guidelines are followed and delivery of tasks in time.

Environment: Oracle, MYSQL, HTML, SQL, XML, JSP, Servlets, JDBC, JAVA, Eclipse, UNIX.

We'd love your feedback!