Hadoop Developer Resume
Little Rock, AR
PROFESSIONAL SUMMARY:
- Over 7+ years of experience with emphasis on Big Data Technologies, Development and Design of Java based enterprise applications.
- 4+ years of experience in developing applications that perform large scale Distributed Data Processing using Big Data ecosystem tools Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Kafka, Oozie, Zoo Keeper, Flume, Yarn and Avro.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce programming paradigm and good hands - on experience in Py Spark and SQL queries.
- Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase & Hive Integration, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework.
- Set up standards and processes for Hadoop based application design and implementation.
- Worked on No SQL databases including HBase, Cassandra and MongoDB .
- Experience on Horton Works and Cloudera Hadoop environments.
- Setting up data in AWS using S3 bucket and configuring instance backups to S3 bucket.
- Good experience in analysis using Pig and Hive and understanding of SQOOP and Puppet.
- Expertise in database performance tuning data modeling.
- Experienced in providing security to Hadoop cluster with Kerberos and integration with LDAP/AD at Enterprise level.
- Involved in best practices for Cassandra, migrating application to Cassandra database from the legacy platform for Choice, upgraded Cassandra 3.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Used the Spark - Cassandra Connector to load data to and from Cassandra .
- Hands on experience in Apache Spark creating RDD’s and Data Frames applying Operations Transformation and Actions and concerting RDD’s to Data Frames.
- Migrating various Hive UDF's and queries into Spark SQL for faster requests.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.F
- Experience in using Apache Kafka for log aggregating.
- Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS and performed the real-time analytics on the incoming data.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Loading the data into EMR from various sources S3 process it using Hive Scripts.
- Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.
- Performed map-side joins on RDD and Imported data from different sources like HDFS / HBase into Spark RDD.
- Familiarity and experience with data warehousing and ETL tools. Good working Knowledge in OOA&OOD using UML and designing use cases.
- Experience working on Solr to develop search engine on unstructured data in HDFS.
- Used Solr to enable indexing for enabling searching on Non-primary key columns from Cassandra key spaces.
- Good understanding of Scrum methodologies, Test Driven Development and Continuous integration.
- Experience in production support and application support by fixing bugs.
- Used HP Quality Center for logging test cases and defects.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
- Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
- Experience working with JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets
- Expert in developing web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Excellent understanding on Java beans and Hibernate framework to implement model logic to interact with RDBMS databases
- Experience in using IDEs like Eclipse, NetBeans and Maven .
TECHNICAL SKILLS:
Hadoop Core Services: HDFS, Map Reduce, Spark, YARN, Hive, Pig, Scala, Kafka, Flume, Tez, Impala, Solr, Oozie, Zookeeper.
Hadoop Distribution: Horton Works, Cloudera, EMR
NO SQL Databases: HBase, Cassandra, MongoDB
Cloud Computing Tools: Amazon AWS
Languages: Java/J2EE, Python, SQL, PL/SQL, Pig Latin, HiveQL, UNIX Shell Scripting
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.
Databases: Oracle, MySQL, SQL
Operating Systems: UNIX, Windows, LINUX, Cent OS
Build Tools: Jenkins, Maven, ANT
Development Tools: Microsoft SQL Studio, Eclipse
Development methodologies: Agile/Scrum, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Little Rock, AR
Hadoop Developer
Responsibilities:
- Implemented AWS solutions using EC2, S3 and load balancers.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Involved in creating Hadoop streaming jobs using Python.
- Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
- Worked on various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Implemented PySpark and Spark SQL for faster testing and processing of data.
- Developed multiple MapReduce jobs in Java for data cleaning.
- Developed Hive UDF to parse the staged raw data to get the Hit Times of the claims from a specific branch for a particular insurance type code.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Used Scala to write the code for all the use cases in Spark and Spark SQL.
- Expertise in implementing Spark and Scala application using higher order functions for both batch and interactive analysis requirement.
- Implemented SPARK batch jobs.
- Worked with Spark core, Spark Streaming and Spark SQL module of Spark.
- Worked on reading multiple data formats on HDFS using PySpark.
- Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.
- Created data model for structuring and storing the data efficiently. Implemented partitioning and bucketing of tables in Cassandra.
- Worked on migrating MapReduce programs into PySpark transformation.
- Built wrapper shell scripts to hold Oozie workflow.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Pig, Hbase, AVRO, Zookeeper, etc.), Amazon Web Services (S3, EC2, EMR etc.)
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
- Worked on MRJ in querying multiple semi-structured data as per analytic needs.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Developed POC for Apache Kafka and Implementing real-time streaming ETL pipeline using Kafka Streams API.
- Involved in using Solr Cloud implementation to provide real time search capabilities on the repository with terabytes of data.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Familiarity with NoSQL databases such as Cassandra.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Performed bug fixing and 24X7 production support for running the processes.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Hadoop 2, MapReduce, Hive, HDFS, Cassandra, PIG, Sqoop, Oozie, EMR, Solr, HBase, ZooKeeper, CDH5, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux, Apache Kafka, Amazon web services.
Confidential, Chicago, IL
Big Data/ Hadoop Developer
Responsibilities:
- Built a Scalable distributed data solution-using Hadoop on a 30-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
- Developed several complex MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
- Used MapReduce to Index the large amount of data to easily access specific records.
- Performed ETL using Pig, Hive and MapReduce to transform transactional data to de-normalized form.
- Configured periodic incremental imports of data from DB2 into HDFS using Sqoop.
- Exported data using Sqoop from HDFS to Teradata on regular basis.
- Developed ETL scripts for data acquisition and transformation using Informatica and Talend.
- Installed and configured Flume, Hive, Pig and Sqoop HBase on the Hadoop cluster.
- Exported and analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Wrote Pig and Hive UDFs to analyze the complex data to find specific user behavior.
- Used Solr workflow engine to schedule multiple recurring and ad-hoc Hive and Pig jobs.
- Created HBase tables to store various data formats coming from different portfolios.
- Created Python scripts in automating the work flows.
- Extracted feeds form social media sites such as Facebook Twitter using Python scripts.
- Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data
- Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
- TibcoJasperSoft was used for the embedding BI reports
- Experience in writing scripts in Python for the automated jobs
- Assisted the team responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
- Conversion of Teradata, RDBMS is formulated in Hadoop backlog files.
- Worked actively with various teams to understand and accumulate data from different sources up on the business requirements
- Worked with the testing teams to fix bugs and ensure smooth and error-free code.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, Zookeeper, PL/SQL, MySQL, DB2, Teradata.
Confidential, Salt Lake City, UT
Hadoop Developer
Responsibilities:
- Responsible for developing efficient MapReduce on AWS cloud programs for more than 20 years’ worth of claim data to detect and separate fraudulent claims.
- Developed Map-Reduce programs from scratch of medium to complex.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
- Played a key-role is setting up a 40 node Hadoop cluster utilizing Apache MapReduce by working closely with the Hadoop Administration team.
- Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to run efficiently the algorithm on the huge datasets.
- Developed Java programs to perform data scrubbing for unstructured data.
- Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.
- Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team
- Used Flume to collect the logs data with error messages across the cluster.
- Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
- Played a key role in installation and configuration of the various Hadoop ecosystem tools such as, Hive, Pig, and HBase.
- Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE
- Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows
- Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
- Designed and developed Dashboards for Analytical purposes using Tableau.
- Analyzed the Hadoop log files using Pig scripts to oversee the errors.
- Actively updated the upper management with daily updates on the progress of project that include the classification levels in the data.
Environment: Java, Hadoop, Map Reduce Hive, Pig, Sqoop, Flume, HBase, Teradata.
Confidential
Hadoop Developer
Responsibilities:
- Installed and Configuration of Hadoop Cluster
- Worked on Hortonworks cluster, which is responsible for providing open source platform based on Apache Hadoop for analyzing, storing and managing big data
- Worked with analyst to determine and understand business requirements
- Load and transform large data sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer data and financial histories into HDFS for analysis
- Used MapReduce and Flume to load, aggregate, store and analyze web log data from different web servers
- Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files
- Involved in submitting and tracking MapReduce jobs using Job Tracker
- Experience writing Pig Latin scripts for Data Cleansing, ETL operations and query optimizations of exists scripts
- Written Hive UDF to sort Structure fields and return complex data types
- Created Hive tables from JSON data using data serialization framework like AVRO
- Experience writing reusable custom Hive and Pig UDF’s in Java and using existing UDF’s from Piggybank and other sources
- Experience in working with NoSQL database HBase in getting real time data analytics
- Integrated Hive tables to HBase to perform row level analytics
- Developed Oozie workflows for daily incremental loads, which Sqoop’s data from Teradata, Netezza and then imported into Hive tables
- Involved in performance tuning by using different service engines like TEZ etc.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using AutoSys and Oozie coordinator jobs
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
- Supported operations team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades
- Providing technical solutions/assistance to all development projects
Environment: Hortonworks, Java, Hadoop, HDFS, MapReduce, Tez, Hive, Pig, Oozie, Sqoop, Flume, Teradata, Netezza, Tableau
Confidential
Java Developer
Responsibilities:
- Involved in designing the Project Structure, System Design and every phase in the project.
- Responsible for developing platform related logic and resource classes, controller classes to access the domain and service classes.
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Involved in Technical Discussions, Design, and Workflow.
- Participate in the Requirement Gathering and Analysis.
- Developed Unit Testing cases using JUnit Framework.
- Implemented the data access using Hibernate and wrote the domain classes to generate the Database Tables.
- Involved in design of JSP’s and Servlets for navigation among the modules.
- Designed cascading style sheets and XML part of Order entry Module & Product Search Module and did client side validations with Java script.
- Involved in implementation of view pages based on XML attributes using normal Java classes.
- Involved in integration of App Builder and UI modules with the platform.
Environment: Hibernate, Java, JAXB, JUnit, XML, UML, Oracle11g, Eclipse, Windows XP.