Sr. Hadoop Developer Resume
Malvern, PA
SUMMARY
- Over 9 years of Professional experience in IT Industry, involved in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications using Java, J2EE
- 3 + years of experience as Hadoop Developer with sound knowledge in Hadoop ecosystem technologies
- Experience on other big data technologies like Apache Kafka, Apache Drill, Spark and Storm.
- Hands on experience in hadoop ecosystems such as HDFS, MapReduce,Yarn, Pig, Hive, Hbase, Oozie, Zookeeper, sqoop, flume.
- Good hands on experience in working with Strem data injection tools such as Kafka and Strom.
- Excellent knowledge with distributed storages (HDFS) and distributed processings (Mapreduce, Yarn)
- Hands on experience in developing MapReduce programs according to the requirements
- Hands on experience in performing data cleaning, pre - processing using Java and Talend data preperation tool
- Experience in analyzing data using HiveQL, Pig Latin, Hbase and custom Map Reduce programs in Java.
- Hands-on experience with message brokers such as Apache Kafka, IBM WebSphere, and RabbitMQ.
- Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Join and organizing data using Practitioners and Buckets.
- Experience in writing Custom Counters for analyzing data and testing using MRUnit framework. Expertise with NoSQL databases such as Hbase, Cassandra.
- Expertise in working with different kind of data files such as flat files, CSV, JSON, Avro, XML and Databases.
- Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata, DB2 into HDFS and viseversa using Sqoop. Experience in using Flume to load logs files into HDFS.
- Experience in working with different compression techniques in Hadoop such as LZO and Snappy.
- Strong experience in collecting and storing log data in HDFS using Apache Flume
- Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources
- Good knowledge of database connectivity (JDBC) for databases like Oracle, DB2, SQL Server, MySQL, MS Access and Netezza.
- Worked extensively on different Hadoop distributions like CDH and Hortonworks
- Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, and Hive.
- Proficient in using various IDEs like Eclipse, My Eclipse and NetBeans
- Hands on experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Good hands on experience in developing Hadoop applications on SPARK using SCALA as a functional and object oriented programming.
- Have good experience creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
- Involved in writing test cases using MR unit. Excellent Programming skills at a higher level of abstraction using Scala and Spark. Good understanding in processing of real-time data using Spark
- Developed small distributed applications in our projects using zookeeper and scheduled the work flows using Oozie
- Hands on experience in build management tool Maven and Ant
- Expertise in design and development of Web Applications involving J2EE technologies with Java, Spring, EJB, AJAX, Servlets, JSP, Struts, Web Services, XML, JMS, JNDI, JDBC and etc.
- Hands on experience with Object Oriented Design, Modeling, Programming and Testing in Java, J2EE, XML and Relational Databases
- Worked on S3 buckets on AWS to store Cloud Formation Templates. Worked on AWS to create EC2 instances. Experience in all phases of software development life cycle
- Hand on experience on UNIX scripting. Hands on experience with Cassandra DB. Experience with working of cloud configuration in Amazon web services AWS
- Good hands on experience in working with REST and SOAP web services.
TECHNICAL SKILLS
Hadoop Eco System: HDFS, MapReduce, Yarn, Pig, Hive, Hbase, Sqoop, Flume, Oozie and Zookeeper
NoSQL Databases: Hbase and Cassandra
Java/J2EE: JSP, Servlets, AJAX, EJB, Struts, Spring, JDBC
Programming Languages: JAVA, Scala, C, C++, Python
Frameworks: Struts, Spring
Databases: MYSQL, SQL, DB2 and Teradata
Web Services: REST, AWS,Jersey, Axis 1.x, SOAP, WSDL, UDDI
Servers: Apache Tomcat, WebSphere, JBoss
Open Source tools: Apache Ant, Log4j
IDE’s: My Eclipse, Eclipse, IntelliJ IDEA, NetBeans, WSAD
Web UI: HTML, JavaScript, XML, SOAP, WSDL
Operating Systems: Linux, Unix, Windows, Solaris
PROFESSIONAL EXPERIENCE
Confidential - Malvern, PA
Sr. Hadoop Developer
Responsibilities:
- All the datasets was loaded from two different source such as Oracle, MySQL to HDFS and Hive respectively on daily basis.
- We got 8 flat files all are delimitated by Comma.
- Experience in importing the real time data to Hadoop using Kafka and implemented the Oozie job
- We were getting on an average of 80 GB on daily basis on the whole the data warehouse for my project was having 5 PB of data and we used 110 node cluster to process the data.
- Developed Kafka producer and consumers, Hbase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Responsible in creating Hive Tables to load the data which comes from MySQL and loading data from Oracle to HDFS using Sqoop
- Good hands on experience in writing core java level programing in order to perform cleaning, pre-processing and data validation.
- Involved in verifying cleaned data using Talend tool with other department. Experienced in creating Hive schema, external tables and managing views.
- Involved in developing Hive UDFs and reused in some other requirements. Worked on performing Join operations.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
- Involved in creating partitioning on external tables. Good hands on experience in writing HQL statements as per the user requirements.
- Fetching the HQL results into CSV files and handover to reporting team. Good hands on experience in working with hive complex datatypes. Involved in Bucketing.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Used Spark SQL to process the huge amount of structured data. Worked on Big Data Integration and Analytics based on Hadoop, Spark and Kafka.
- Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
- Assigned name to each of the columns using case class option in Scala.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Registered the datasets as Hive Table.
- Developed solutions to pre-process large sets of structured, with different file formats (Text file, Avro data files, Sequence files, Xml and JSON files, ORC and Parquet).
- Experienced with batch processing of data sources using Apache Spark. Configured Kafka and Storm cluster to handle the load and optimize to get desired throughput
- Developing predictive analytic using Apache Spark Scala APIs.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Developed storm-monitoring bolt for validating pump tag values against high-low and
- Good knowledge in partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Worked on the core and Spark SQL modules of Spark extensively. Expertize in running Hadoop streaming jobs to process terabytes data.
Environment: Hadoop, HDFS, Hive, Scala, Spark, Strom, SQL, Hbase, UNIX Shell Scripting, Talend.
Confidential - Raleigh, NC
Hadoop Developer
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in writing MapReduce jobs. Involved in SQOOP, HDFS Put or CopyFromLocal to ingest data.
- Implemented Kafka Storm topologies, which are capable of handling and channelizing high stream of data and integrating the storm topologies with Esper to filter and process that data across multiple clusters for complex event processing
- Experienced in Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- This involves Kafka and storm cluster design, installation and configuration. Developed Kafka producer to produce 1 million messages per second
- Have hand on experience in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
- Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based Data pipeline. This pipeline is also involved in Amazon Web Services EMR, S3 and RDS
- Using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
- Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
- Worked on AWS to create EC2 instance and installed Java, Zookeeper and Kafka on those instances.
- Worked on S3 buckets on AWS to store Cloud Formation Templates. Experienced in Eclipse and ant to build the application.
- Involved in using SQOOP for importing and exporting data into HDFS and Hive.
- Responsible for processing ingested raw data using MapReduce, Apache Pig and Hive.
- Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Had a couple of workshops on Spark, RDD & spark-streaming.
- Discussed the implementation level of concurring programing in spark using python with message passing.
- Involved in discussing spark-SQL and spark MLib
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, AWS, Oracle 11g, Core Java Cloudera HDFS, Eclipse.
Confidential - Rapid City, SD
Hadoop Developer
Responsibilities:
- Have hand on experience in coding Map Reduce program, Hive queries, testing and debugging the Map Reduce programs.
- Knowledge on for Installing, Configuring and Managing of Hadoop Cluster spanning multiple racks.
- Implemented best income logic using Pig scripts and UDFs. Implemented best income logic using Pig scripts and UDFs.
- Developed Pig latin scripts in the areas where extensive coding needs to be reduced to analyze large data sets.
- Involved in Using Sqoop tool to extract data from a relational database into Hadoop.
- Responsible for performance enhancements of the code and optimization by writing custom comparators and combiner logic.
- Worked closely with data warehouse architect and business intelligence analyst to develop solutions.
- Job management using Fair scheduler.
- Responsible for performing peer code reviews, troubleshooting issues and maintaining status report.
- Hand on experience in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
- Responsible for identifying possible ways to improve the efficiency of the system. Involved in the requirement analysis, design, development and Unit Testing use of MRUnit and Junit.
- Prepare daily and weekly project status report and share it with the client.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Apache Hadoop, Java (JDK 1.7), Oracle, My SQL, Hive, Pig, Sqoop, Linux, Cent OS, Junit, MRUnit
Confidential - Raleigh, NC
Java/J2EE developer
Responsibilities:
- Interacted with the team to understand the requirements.
- Involved in the complete lifecycle (SDLC) of the project i.e. Design, Development,
- Implementation, Unit testing and Support. Developed Admin Portal of the Project. Performed unit testing and system integration testing.
- Involved in fixing the bugs based on the client requirements and involved in the enhancements in the web forms, class files.
- Implemented a ftp utitlity program for copying the contents of an entire directory recursively upto two levels from a remote location using Socket Programming.
- Implemented a reliable socket interface using the sliding window protocol like TCP stream sockets over UDP unreliable communication channel and later on, tested using the Ftp utility program.
- Strong domain knowledge of TCP/IP with the expertise in socket programming and IP security domain (IPSec, TLS, SSl and VPN, Firewall and NATs).
- Have built the strong communication between the source and destination message using socket programming.
- Read the data from the servers and write to the servers.
- Have assigned the multiple client communication using Socket Programming.
- Proficient in developing SOAP and RESTful Web Services
- Build RESTful Web services using JAX-RS API.
Environment: Tomcat 7.32, Apache-james-3.0-beta, JDK 1.6, Core Java, Hibernate, JSP, Servlets, Socket Programming, Ajax, Java Script, SQL Server2008, JAX-WS using Apache CXF, PHP 5.2.1, Windows 2008 Server.
Confidential
JAVA Developer
Responsibilities:
- Gathering requirements from end users and create functional requirements. Design class diagram and sequence diagram
- Development of Graphical user interface planning screen. Create excel template for reporting function
- Development of business for planning screen which is both domestic and international assignments
- Development of business logic for filling excel template with values where formulas are maintained within excel to display the graphs
- Create requirements and test specification traceability matrix. Implemented four eyes principle and created quality check process for every module release
- Supported in User acceptance testing. Developed help screen to display the end user documentation and contact information
- Responsible for versioning and overall deployment of the tool
Environment: Eclipse, Tomcat, SVN, JSP, Struts, Spring, Hibernate, Oracle, JavaScript
Confidential
Junior JAVA Developer
Responsibilities:
- Gathering requirements from end users and create functional requirements. Contribute on process flow analyzing the functional requirements
- Development of Graphical user interface for user self-service screen. Development of business logic for user self-service and integrate with all workflows on the platform
- Development of business logic for custom permission management on platform level
- Create requirements and test specification traceability matrix
- Contribute for integration of workflows in web portal
- Implemented four eyes principle and created quality check process - reusable across all workflow on overall platform level
- Support in end - user training, testing and documentation
Environment: Eclipse, Tomcat, SVN, JSP, Struts, Spring, Hibernate, Oracle, JavaScript