We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Jersey City, NJ

SUMMARY

  • Over 8 years of experience with emphasis on Big Data technologies, development and design of Java based enterprise applications
  • Expertise in teh creation of On - prem and Cloud Data Lake
  • Experience working with Cloudera, Hortonworks and Pivotal Distributions of Hadoop
  • Expertise in HDFS, Mapreduce, Spark, Hive, Impala, Pig, Sqoop, Hbase, Oozie, Flume, Kafka and various other ecosystem components
  • Expertise in Spark framework for batch and real time data processing
  • Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
  • Experience in performance tuning teh Hadoop cluster by gathering and analyzing teh existing infrastructure.
  • Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
  • Experience in converting MapReduce applications to Spark.
  • Experience in handline messaging services using Apache Kafka.
  • Experience in working with flume to load teh log data from multiple sources directly into HDFS
  • Experience in Data migration from existing data stores and mainframe NDM(Network Data mover) to Hadoop
  • Good Knowledge with NoSql Databases - Cassandra, Mongo DB and HBase.
  • Experience in handling multiple relational databases: MySQL, SQL Server, PostgeSQL and Oracle.
  • Experience in supporting data analysis projects using Elastic Map Reduce on teh Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in supporting analysts by administering and configuring HIVE.
  • Experience in running Pig and Hive scripts.
  • Experience in fine-tuning Mapreduce jobs for better scalability and performance.
  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
  • Performed Importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in writing shell scripts to dump teh Sharded data from Landing Zones to HDFS.
  • Worked on predictive modeling techniques like Neural Networks, Decision Trees and Regression Analysis.
  • Experience in Data mining and Business Intelligence tools such as Tableau, SAS Enterprise Miner, JMP and Enterprise Guide, IBM SPSS modeler and MicroStratergy.

TECHNICAL SKILLS

Hadoop Ecosystem Development: HDFS, MapReduce, Spark, Hive, Pig, Flume, Oozie, Zookeeper, HBASE, Cassandra, Kafka,Solr, HCatalog, Sqoop.

Operating System: Linux, Windows XP, Server 2003, Server 2008.

Databases: MySQL, Oracle, MS SQL Server, PostgreSQL, MS Access

Languages: C, JAVA, PYTHON, SQL, Pig, UNIX shell scripting

PROFESSIONAL EXPERIENCE

Confidential, Jersey City, NJ

Hadoop Developer

Responsibilities:

  • Worked on teh creation of business rules in Pig
  • Imported data from legacy systems to Hadoop using Sqoop and Apache Camel
  • Used Pig for data transformation
  • Used Apache Spark for real time and batch processing
  • Used Apache Kafka for handling log messages that are handled by multiple systems
  • Used shell scripting extensively for data munging
  • Worked on HCatalog, which allows PIG and Map Reduce to take advantage of teh SerDE data format transformation definitions are already written on HIVE
  • Worked on DevOps tools like Chef, Artifactory and Jenkins to configure and maintain teh production environment
  • Used Pig to transform data into various formats
  • Stored processed tables in Cassandra from HDFS for applications to access teh data in real time
  • Used Solr on Cassandra for implementation of near real-time search
  • Worked on writing UDFs in Java for Pig
  • Created ORCFile tables from teh existing non-ORCFile Hive tables

Environment: Hortonworks Data Platform 2.2, Pig, Hive, Spark, Kafka, Cassandra, Sqoop, Apache Camel, Apache Crunch, HCatalog, Chef, Jenkins, Artifactory, Avro, IBM Data Studio

Confidential, Piscataway, NJ

Hadoop Developer

Responsibilities:

  • Worked on teh creation of on-premise and cloud data lake from start with Pivotal distribution
  • Imported data from various relational data stores to HDFS using Sqoop
  • Collected user activity data, log data using Kafka for real time analytics
  • Implemented batch processing using Spark
  • Converted Hive tables to HAWQ for higher query performance
  • Responsible for loading unstructured and semi-structured data into Hadoop cluster from different data sources using Flume
  • Used Hive data warehouse tool to analyze teh data in HDFS and developed Hive queries
  • Used teh RegEx, JSON, Parquet and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse teh contents of streamed log data
  • Implemented Hive and Pig custom UDF’s to achieve comprehensive data analysis
  • Used Pig to develop ad-hoc queries
  • Exported teh business required information to RDBMS using Sqoop to make teh data available for BI team to generate reports based on data
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie
  • Responsible for troubleshooting Spark/MapReduce jobs by reviewing teh log files
  • Used Tableau for visualizing and to generate reports

Environment: Pivotal HD 2.0, Gemfire XD, MapReduce, Spark, Pig, Hive, Kafka, Sqoop, HBase, Cassandra, Flume, Oozie, Tableau, Aspera, AWS, HCatalog

Confidential, Minneapolis, Minnesota

Hadoop Developer

Responsibilities:

  • Imported data from our relational data stores to Hadoop using Sqoop.
  • Created various Mapreduce jobs for performing ETL transformations on teh transactional and application specific data sources.
  • Wrote PIG scripts and executed by using Grunt shell.
  • Big data analysis using Pig and User defined functions (UDF).
  • Worked on loading tables to Impala for faster retrieval using different file formats.
  • Performance tuning of queries in Impala for faster retrieval.
  • Teh system was initially developed using Java. Teh Java filtering program was restructured to has business rule engine in a jar that can be called from both java and Hadoop.
  • Created Reports and Dashboards using structured and unstructured data.
  • Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
  • Performed joins, group by and other operations in MapReduce by using Java and PIG.
  • Worked on Amazon Web Services (AWS) to complete set of infrastructure and application services that runs virtually everything in teh cloud from enterprise applications and big data project.
  • Processed teh output from PIG, Hive and formatted it before sending to teh Hadoop output file.
  • Used HIVE definition to map teh output file to tables.
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Wrote data ingesters and map reduce programs
  • Reviewed teh HDFS usage and system design for future scalability and fault-tolerance;
  • Wrote MapReduce/HBase jobs
  • Worked with HBASE NOSQL database.

Environment: Hadoop, Java 1.5, UNIX, Shell Scripting, XML, HDFS, HBase, NOSQL, MapReduce, Hive, Impala, PIG.

Confidential, Bluebell, PA

Hadoop Consultant

Responsibilities:

  • Responsible for installing and configuring Hadoop MapReduce, HDFS, also developed various MapReduce jobs for data cleaning
  • Installed and configured Hive to create tables for teh unstructured data in HDFS
  • Hold good expertise on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Sqoop and Flume.
  • Involved in loading data from UNIX file system to HDFS
  • Responsible for managing and scheduling jobs on Hadoop Cluster
  • Responsible for importing and exporting data into HDFS and Hive using Sqoop
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data
  • Experienced in managing Hadoop log files
  • Worked on managing data coming from different sources
  • Wrote HQL queries to create tables and loaded data from HDFS to make it structured
  • Load and transform large sets of structured, semi structured and unstructured data
  • Extensively worked on Hive for generating transforming files from different analytical formats to .txt i.e. text files enabling to view teh data for further analysis
  • Created Hive tables, loaded them with data and wrote hive queries that run internally in MapReduce way
  • Wrote and modified store procedures enabling to load and modify data according to teh project requirements
  • Responsible for developing PIG Latin scripts enabling teh extraction of data from teh web server output files to load into HDFS
  • Extensively used Flume to collect teh log files from teh web servers and tan integrated these files into HDFS
  • Responsible for implementing schedulers on Job Tracker enabling them to TEMPeffectively use teh resources available in teh cluster for any given MapReduce jobs.
  • Constantly worked on tuning teh performance of teh queries in Hive and Pig, making teh queries work even more powerfully in processing and retrieving teh data
  • Supported Map Reduce Programs running on teh cluster
  • Created external tables in Hive and loaded teh data into these tables
  • Hands on experience in database performance tuning and data modeling
  • Monitored teh cluster coordination using ZooKeeper

Environment: Hadoop, HDFS, MapReduce, HortonWorks, Hive, Java (jdk1.6), DataStax, Flat files, UNIX Shell Scripting, Oracle 11g 10g, PL SQL, SQL*PLUS, Toad 9.6, Windows NT.

Confidential, Pittsburgh, PA

Sr. Java Developer

Responsibilities:

  • Developed detail design document based on design discussions.
  • Involved in designing teh database tables and java classes used in teh application.
  • Involved in development, Unit testing and system integration testing of teh travel network builder side of application.
  • Involved in design, development and building teh travel network file system to be stored in NAS drives.
  • Setup Linux environment for to interact with route smart library (.so) file and NAS drive file operations using JNI.
  • Implemented and configure Hudson as Continuous Integration server and Sonar for maintaining code and remove redundant code.
  • Worked with Route-smart C++ code to interact with Java application using SWIG and Java Native interfaces.
  • Developed teh user interface for requesting a travel network build using JSP and Servlets.
  • Build business logic to users can specify which version of teh travel network files to be used for teh solve process.
  • Used Spring Data Access Object to access teh data with data source.
  • Build an independent property sub-system to ensure that teh request always picks teh latest set of properties.
  • Implemented thread Monitor system to monitor threads. Used JUnit to do teh Unit testing around teh development modules.
  • Wrote SQL queries and procedures for teh application, interacted with third party ESRI functions to retrieve map data.
  • Building and Deployment of JAR, WAR, EAR files on dev, QA servers.
  • Bug fixing (Log 4j for logging) and testing support after teh development.
  • Prepared requirements and research to move teh map data using Hadoop framework for future usage.

Environment: Java 1.6.21, J2EE, Oracle 10g, Log4J 1.17, Windows 7 and Red Hat Linux, Sub version, Spring 3.1.0, Icefaces 3, ESRI, Weblogic 10.3.5, Eclipse Juno, Junit 4.8.2, Maven 3.0.3, Hudson 3.0.0 and Sonar 3.0.0

Confidential

Java Developer

Responsibilities:

  • Involved in Requirements gathering, Requirement analysis, Design, Development, Integration and Deployment.
  • Involved in Order Placement / Order Processing module.
  • Responsible for teh design and development of teh customizations framework
  • Designed and Developed UI’s using JSP by following MVC architecture.
  • Developed teh application using Struts framework. Teh views are programmed using JSP pages with teh struts tag library, Model is teh combination of EJB’s and Java classes and web implementation controllers are Servlets.
  • Used EJB as a middleware in designing and developing a three-tier distributed application.
  • Teh Java Message Service (JMS) API is used to allow application components to create, send, receive, and read messages.
  • Used JUnit for unit testing of teh system and Log4J for logging.
  • Created and maintained data using Oracle database and used JDBC for database connectivity.
  • Created and implemented Oracle stored procedures and triggers.
  • Installed Web Logic Server for handling HTTP Request/Response. Teh request and response from teh client are controlled using Session Tracking in JSP.
  • Worked on teh front-end technologies like HTML, JavaScript, CSS and JSP pages using JSTL tags.
  • Reported daily about teh team progress to teh Project Manager and Team Lead.

Environment: Core Java, J2EE 1.3, JSP 1.2, Servlets 2.3, EJB 2.0, Struts 1.1, JNDI 1.2, JDBC 2.1, Oracle 8i, UML, DAO, JMS, XML, Web Logic 7.0, MVC Design Pattern, Eclipse 2.1, Log4j and JUnit.

We'd love your feedback!