We provide IT Staff Augmentation Services!

Hadoop /spark Developer Resume

Dallas, TX

SUMMARY:

  • Over 8 years of professional experience in IT, this includes Analysis, Design, Coding, Testing, Implementation and Training in Java and Big Data Technologies working with Apache Hadoop Eco - components
  • Extensive Experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, Hive, PIG, HBase, Zookeeper, Sqoop, Oozie, Flume, Storm, Yarn, Spark, Scala.
  • Good understanding and working experience on Hadoop Distributions like Cloudera and Hortonworks.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data in near real time
  • Hands on working experience in Linux environment with Apache Tomcat. Used UML to design class diagrams for object oriented analysis and designing.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Good Knowledge in creating event-processing data pipelines using flume, Kafka and Storm.
  • Expertise in data transformation & analysis using SPARK, PIG, HIVE
  • Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Hands on experience in big data ingestion tools like Flume and Sqoop.
  • Hands on NoSQL database experience with HBase, Cassandra.
  • Experience with ETL and Query big data tools like Pig Latin and Hive QL.
  • Expertise in writing Hadoop Jobs for analyzing data using Spark, Hive, Pig MapReduce, Hive.
  • Good understanding of HDFS Designs, Daemons, HDFS High Availability (HA).
  • Hands on experience in AVRO and Parquet file format, Dynamic Partitions, Bucketing for bestPractice and Performance improvement.
  • Excellent understanding and knowledge of NOSQL databases like HBase, and Mongo DB.
  • Experience in database design using Stored Procedure, Functions, Triggers and strong experience in writing complex queries for DB2, SQL Server
  • Developed Spark SQL programs for handling different data sets for better performance.
  • Good knowledge of creating event-processing data using Spark Streaming.
  • Experience in building web services using both SOAP and RESTful services in Java.
  • Good hands of experience in configuration, deployment and management of enterprise applications on Application Servers like Web Sphere, JBoss and Web Servers like Apache Tomcat.
  • Experience in performing Unit testing using Junit and TestNG.
  • Extensive experience in documenting requirements, functional specifications, technical specifications

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, Map Reduce, Spark Core, Spark Streaming, Spark SQL, Hive, Tez, Pig, Sqoop, Flume, Kafka, Oozie, NiFi and ZooKeeper, Docker.

AWS Components: EC2, S3, RDS, Redshift, EMR, DynamoDB, Lambda, RDS, SNS, SQS

No SQL Databases: HBase, Cassandra, MongoDBLanguages C, C++, Java, Scala, J2EE, Python, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets

EJB, JSF, JQuery Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, SBT, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Hadoop /Spark Developer

Responsibilities:
  • Involved in loading data from LINUX and UNIX file system to HDFS.
  • Written Hive UDFs to extract data from staging tables
  • Analyzed the web log data using the HiveQL and process through Flume
  • Replaced default Derby metadata storage system for Hive with MySQL system.
  • Executed queries using Hive and developed Map-Reduce jobs to analyze data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed the Pig UDF's to preprocess the data for analysis.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop
  • Implemented Proof of Concepts on Hadoop stack and different big data analytic tools, Migration from different databases
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Created Hive tables and load the data using Sqoop and worked on them using Hive QL
  • Responsible for developing custom UDFs, UDAFs and UDTFs in Pig and Hive.
  • Optimizing the Hive Queries using the various files format like JSON, AVRO, ORC, and Parquet
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common word2vec data model, which gets the data from Kafka in near real time and Persists into Cassandra.
  • Operating the cluster on AWS by using EC2, EMR, S3 and Elastic Search.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Migrated existing MapReduce programs to Spark using Scala and Python
  • Developed Scala scripts, UDFFs using both Data frames in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable
  • Analyze the tweets json data using hive SerDeAPI to deserialize and convert into readable format
  • Processed application Weblogs using flume and load them into Hive for analyzing the data
  • Implemented RESTful Web Services to interact with Cassandra to store/retrieve the data.
  • Generated detailed design documentation for the source-to-target transformations.
  • Wrote UNIX scripts to monitor data load/transformation.
  • Involved in planning process of iterations under the Agile Scrum methodology

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Spark, Oozie, Zookeeper, AWS, RDBMS/DB, MySQL, CSV, AVRO data files

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop
  • Implemented Proof of Concepts on Hadoop stack and different big data analytic tools, Migration from different databases
  • Involved in loading data from LINUX and UNIX file system to HDFS.
  • Written Hive UDFs to extract data from staging tables
  • Analyzed the web log data using the HiveQL and process through Flume
  • Replaced default Derby metadata storage system for Hive with MySQL system.
  • Executed queries using Hive and developed Map-Reduce jobs to analyze data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed the Pig UDF's to preprocess the data for analysis.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
  • Responsible for developing custom UDFs, UDAFs and UDTFs in Pig and Hive.
  • Optimizing the Hive Queries using the various files format like JSON, AVRO, ORC, and Parquet
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Analyze the tweets json data using hive SerDeAPI to deserialize and convert into readable format
  • Processed application Weblogs using flume and load them into Hive for analyzing the data
  • Implemented RESTful Web Services to interact with Oracle to store/retrieve the data.
  • Generated detailed design documentation for the source-to-target transformations.
  • Wrote UNIX scripts to monitor data load/transformation.
  • Involved in planning process of iterations under the Agile Scrum methodology.

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Spark, Oozie, Zookeeper, AWS, RDBMS/DB, MySQL, CSV, AVRO data files

Confidential,Louisville,KY

Hadoop Developer

Responsibilities:
  • Involved in gathering the requirements, designing, development and testing
  • Created the script files for processing data and loading to HDFS
  • Created CLI commands using HDFS.
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Completely involved in the requirement analysis phase.
  • Analyzing the requirement to setup a cluster
  • Created two different users (hduser for performing hdfs operations and map red user for performing map reduce operations only)
  • Involved in crawl data flat files generation from various retailers to HDFS for further processing.
  • Created the Apache PIG scripts to process the HDFS data.
  • Created Hive tables to store the processed results in a tabular format.
  • Developed the sqoop scripts in order to make the interaction between Pig and Cassandra Database.
  • Setup Hive with MySQL as a Remote Metastore
  • Log/text files generated by various products are moved into HDFS location
  • Created Map Reduce code that will take input as log files and parse the logs and structure them in tabular format to facilitate effective querying on the log data
  • Created External Hive Table on top of parsed data.

Environment: Cloudera Distribution, Hadoop Map Reduce, HDFS, Python, Hive, HBase, HiveQL, Sqoop, Java, Unix Maven.

Confidential

Sr. Java Developer

Responsibilities:
  • Design and develop the interface to interact with web services for card payments
  • Performed enhancements to existing SOAP web services for online card payments
  • Performed enhancements to existing billing screens by developing servlets and JSP Pages
  • Involved in end to end batch loading process using Abinitio
  • Transformed the Use Cases into Class Diagrams, Sequence Diagrams and State diagrams
  • Developed Validation Layer providing Validator classes for input validation, pattern validation and access control
  • Generated Domain Layer classes using DAO’s from the Database Schema.
  • Defined set of classes for the Helper Layer which validates the Data Models from the Service Layer and prepares them to display in JSP Views.
  • Used AJAX calls to dynamically assemble the data in JSP page, on receiving user input.
  • Used Log4J to print the logging, debugging, warning, info on the server console.
  • Involved in creation of Test Cases for JUnit Testing and carried out Unit testing.
  • Used SVN as configuration management tool for code versioning and release deployment on Oracle Weblogic Server 10.3.
  • Used ANT tool for deployment of the web application on the Weblogic Server.
  • Interacted with business team to transform requirements into technical solutions.
  • Involved in the functional tests of the application and also resolved production issues.
  • Designed and Developed application using EJB and Struts framework.
  • Developed POJO’s for Data Model to map the Java Objects with Relational database tables.
  • Designed and developed Service layer using Struts framework.
  • Used MVC based Struts framework to develop the multi-tier web application presentation layer components.
  • Involved in Integration of Struts with Database.
  • Implemented Struts tag libraries like html, logic, tab, bean etc in the JSP pages.
  • Used Struts tiles libraries for layout of web page, and performed struts validations using Struts validation framework.
  • Implemented Oracle database and JDBC drivers to access the data.
  • Involved in design, analysis and architectural meetings. Created Architecture Diagrams, and Flow Charts using Rational Rose.
  • Involved in agile software development practice paired programming, test driven development and scrum status meetings.
  • Developed use case diagrams, class diagrams, database tables, and mapping between relational database tables.
  • Developed Unit test cases using JUnit.
  • Maintained the application configuration information in various properties file.
  • Performed unit testing, system testing and integration testing.

Environment: Java, J2EE, Struts & Tiles, SOAP, Web Services, ANT, Solaris, WEBLOGIC7.0, Oracle 8i, Abinitio, Mainframe, OSS/BSS,Log4j,Servelets,JSP,JSTL,JDBC,HTML,Java Script,CSS,RationalRose,UML

Confidential

Java Developer

Responsibilities:

  • Worked on Service Layer which provided business logic implementation.
  • Involved in building PL/SQL queries and stored procedures for Database operations.
  • Involved in specification analysis and identifying the requirements.
  • Participated in design discussions for the methodology of requirement implementation
  • Involved in preparation of the Code Review Document & Technical Design Document
  • Design and developed the presentation layer using the jsp pages for the payment module
  • Developed controllers and JavaBeans encapsulating the business logic
  • Developed classes to interface with underlying web services layer
  • Used patterns including Singleton, MVC, DAO, DTO, Front Controller, Service Locator and Business Delegate.
  • Used Jasper Reports to provide print preview of Financial Reports and Monthly Statements.
  • Carried out integration testing & acceptance testing
  • Used JMeter to carry out performance tests on external web service calls, database connections and other dynamic resources.
  • Participated in the team meetings and discussed enhancements, issues and proposed feasible solutions

Environment:: Java1.4, J2EE 1.4 Servlet, JSP, JDBC, XML, ANT, Apache Tomcat 5.0, Oracle 8i, JUnit, PL\SQL, UML, NetBeans

Hire Now