We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Peachtree City, GA

SUMMARY

  • Versatile, dynamic and a technically - competent problem solver with over 7 years of experience in Hadoop & Big Data (4 years) and Java, J2EE (3+years) technologies.
  • Expertise in Design and Implementation of Big Data solutionsin Retail, Finance and E-commerce domains.
  • Hands-on experience in Installation, Configuration, Support and Management of Cloudera’s Hadoop platform along with CDH3&4 clusters.
  • Sound knowledge of Hadoop Architecture, Administration, HDFS-Federation & High Availability and Streaming API along with Data ware housing Concepts.
  • Experienced in understanding complex Big Data processing needs and developing MapReduce jobs (in Java), Scala codes and modules to address those needs.
  • Experience with handling Data accuracy, Scalability and Integrity of Hadoop platforms.
  • Experience with complex data processing pipelines, including ETL and Data Ingestion dealing with unstructured and semi-structured data.
  • Knowledgeable of Apache Spark, Scala mainly in framework exploration for transition from Hadoop/MapReduce to Spark.
  • Knowledge on Designing and Implementing ETL process to load data from various data sources to HDFS using Flume and Sqoop, performing transformation logic using Hive, Pig and Integration with BI tools for Visualization/Reporting.
  • Solid understanding of NoSQL databases like Mongo DB, HBase and Cassandra.
  • Expertise in performing Large-scale Web crawling with Apache Nutch using a Hadoop/HBase cluster.
  • Knowledge in Job Workflow Scheduling and Monitoring tools like Oozie and Zookeeper.
  • Experience with working on the AWS cloud environment.
  • Excellent Java development skills using J2EE, spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
  • Experienced in working with various frameworks like Struts, spring, Hibernate, EJB and JSF.
  • Professional knowledge of UNIX, Shell and PERL Scripting.
  • Knowledge of Data Warehousing and ETL Tools like Informatica and Pentaho.
  • Hands on knowledge of writing code in Scala.
  • Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology.
  • Experienced in Agile Scrum, RUP and TDD software development methodologies.
  • Possess strong commitment to team environment dynamics with the ability to lead, contribute expertise and follow leadership directives at appropriate times.
  • Effectively used Oozie to develop automatic workflows for Sqoop, Map Reduce and Hive jobs.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Spark,Splunk, Impala, Kafka, Talend, Oozie, Zookeeper,Flume, Storm, AWS, EC2, EMR.

Programming Languages: Java, Scala, Python, C/C++, PL/SQL.

Scripting Languages: PHP, JQuery, JavaScript, XML, HTML, Bash, Ajaxand CSS.

UNIX Tools: Apache, YUM, RPM.

J2EE Technologies: Servlets, JSP, JDBC, EJB, & JMS.

Databases: NoSQL- MongoDB & Cassandra, Oracle.

Data Integration Tools: Informatica, Pentaho.

Methodologies: Agile, Scrum, SDLC, UML, Design Patterns.

IDEs: Eclipse, NetBeans, WSAD, RAD.

Platforms: Windows, Linux, Solaris, AIX, HPUX, Centos.

Application Servers: Apache Tomcat, Web Logic, WebSphere,JBoss 4.0

Frameworks: Spring, MVC, Hibernate, Struts, Log4J, Junit,WebServices

PROFESSIONAL EXPERIENCE

Confidential, Peachtree City, GA

Hadoop Developer

Responsibilities:

  • Working extensively in creating MapReduce jobs for search and analytics in the identification of various trends across the data for Infotainment product line.
  • Working on Data analytics using Pig and Hive. Hive made it easier to extract information out of very old data.
  • Designing the adaptive ecosystem to ensure dat the archived data was accessible using third party BI tools.
  • Using Oozie for workflow orchestration in the automation of MapReduce, Pig and Hive jobs.
  • Installing and configuring the Hadoop cluster and developing multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Analyzing the information from the automobile-bounded unit Dedicated Short Range Communication, using Pig and Hive, makes it easier to monitor vehicles and the road status.
  • Responsible for optimizing data across network using Combiners, joining multiple schema datasets using Joins and organizing data using Partitions and Buckets.
  • Writing jobs in Scala for the company’s parallel data processing center located in the vicinity.
  • Moving large datasets hourly with AVRO file format and imposing Hive and Impala queries.
  • Working on importing data into HBase using HBase Shell and HBase Client API.
  • Capturing archived data from existing relational database into HDFS using Sqoop.
  • Installing and configuring remote Hive Metastore for both - development and production jobs as required.
  • Coordinating the cluster services using Zookeeper.
  • Improving system performance by working with the development team to analyze, identify and resolve issues quickly.
  • Storing the geographically pre-distributed datasets in Cassandra.
  • Capturing the data logs from web server into HDFS using Flume for analysis.
  • Writing Pig scripts and implementing business logic using Pig UDFs to pre-process the data for analysis.
  • Managing and reviewing Hadoop log files, thereby keeping track of nodes’ health.

Environment: CDH4 with Hadoop 2.x, HDFS, MapReduce, Pig, Hive, Oozie, Sqoop, Scala, Zookeeper, HBase, Cassandra, Flume, Servlets, JSPs, JSTL, HTML, JavaScript.

Confidential, Cupertino, CA

Hadoop Consultant

Responsibilities:

  • Used Cloudera Manager for Hadoop cluster environment administration dat includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, and trouble shooting.
  • Developed efficient MapReduce programs for data cleaning and structuring using Java and Python.
  • Supported the team in Code/Design analysis, Strategy development and Project planning.
  • Modeled and made the data query-able using a unified query service.
  • Developed Hive queries for data sampling and pre-analysis before submitting to the analysts.
  • Implemented Kafka Storm topologies, which are capable of handling and channelizing high stream of data and integrating the stormtopologies with Esper to filter and process dat data across multiple clusters for complex event processing.
  • Registered, ingested, validated, stored and archived the data in its native form.
  • Used Oozie to automate and schedule business workflows invoking Sqoop, MapReduce and Pig jobs as per the requirements.
  • Used Cassandra to store majority of the data which needs to be divided regionally.
  • Worked on Splunk to leverage the archived and specialized analytics of Hadoop.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Cleansed, enriched, transformed, and analyzed the data through hosted compute engines.
  • Used Apache Spark for Performance optimization and Parallel data processing.
  • Developed Sqoop scripts to import and export data from relational sources and handled incremental loading on the customer and transaction data by date.
  • Implemented business logic by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources. Developed Pig UDFs for pre-processing as well.
  • Created HBase tables to load large, disparate datasets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Responsible for automation to add data nodes as needed
  • Worked with various HDFS file-formats like Avro, Sequence File and various compression formats like Snappy, bzip2.
  • Identified several PL/SQL batch applications in General Ledger processing and conducted performance comparison to demonstrate the benefits of migrating to Hadoop.

Environment: Hadoop 2.0, MapReduce, HDFS, Hive, Java, Cloudera, Pig, HBase, Kafka Storm, Splunk, MySQL Workbench, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS.

Confidential, Bloomington, IL

Hadoop/Big data Developer

Responsibilities:

  • Designed, developed and supported a MapReduce-based data processing pipeline to process growing number of events from log files and messages per day.
  • Worked closely with client development staff to perform ad-hoc queries and data analysis on newly created cross-platform datasets using Apache Hive and Pig.
  • Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
  • Used Hive- Partitioning and bucketing, to segregate the data and analyze it.
  • Incorporated various job flow mechanisms in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
  • Implemented open-source monitoring tool GANGLIA for monitoring the various services across the cluster.
  • Collaborated with the administration team to set up a monitoring infrastructure for supporting and optimizing the Hadoop infrastructure.
  • Responsible for writing complex SQL-queries involving multiple inner and outer joins.
  • Developed and supported a Scala-based data processing pipeline for one of the processing centers located in Sacramento.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Worked with the applications team to install the operating systems, Hadoop updates, patches and version upgrades as required.
  • Personally dealt with the stakeholders in order to closely understand the business needs.
  • Computed various metrics and loaded the aggregated data onto DB2 for reporting on the dashboard.
  • Designed the shell script for backing up of important metadata and rotating the logs on a monthly basis.
  • Greatly sharpened my business acumen with knowledge on the health insurance, claim processing, fraud suspect identification, appeals process and other domains.

Environment: CDH4 with Hadoop 1.x, HDFS, MapReduce, Pig, Hive, Oozie, Sqoop, Flume, Servlets, JSPs, JSTL, HTML, JavaScript, JQuery, CSS.

Confidential

Sr. Java/J2EE developer

Responsibilities:

  • DesignedanddevelopedStruts-like MVC 2 Webframework using the front-controller design pattern, which is used successfully in a number of production systems.
  • Spearheadedthe “Quick Wins” project by working very closely with the business and end users to improve the current website’s ranking from being 23rdto 6thin just 3 months.
  • Normalized Oracle database, conforming to design concepts and best practices.
  • Resolvedproduct complications at customer sites and funneled the insights to the development and deployment teams to adopt long term product development strategy with minimal roadblocks.
  • Built front end UI using JSP, Servlets, HTML and JavaScript to create user friendly and appealing interface.
  • Used JSTL and built custom tags whenever necessary.
  • Used Expression Language to tie beans to UI components.
  • Convinced business users and analysts with alternative solutions dat are more robust and simpler to implement from technical perspective while satisfying the functional requirements from the business perspective.
  • Applied design patterns and OO design conceptsto improve the existing Java/JEE based code base.
  • Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized block of code.

Environment: Java 1.2/1.3, Swing, Applet, Servlets, JSP, custom tags, JNDI, JDBC, XML, XSL, DTD, HTML, CSS, Java Script, Oracle, DB2, PL/SQL, WebLogic, JUnit, Log4J and CVS.

Confidential

Java/J2EE developer

Responsibilities:

  • Developed user interface usingJSP, HTML, CSSandJavaScript.
  • Responsible for gathering and analyzing the requirements for the project.
  • Implemented the various unified modeling language diagrams like use case diagram, ER diagram for the project.
  • Used Dependency injection inspringfor Service layer and DAO layer.
  • J2EE Architecture was implemented usingStrutsbased on theMVC2pattern.
  • Wrote Servlets and deployed them onWebSphereApplication server.
  • Created the user validations on client side as well as server side.
  • Developed the Java classes to be used inJSPandServlets.
  • Extensively used JavaScript for client side validations.
  • Improved the coding standards, code reuse and participated in code-reviews.
  • Worked withPL/SQLscripts to gather data and perform data manipulations.
  • UsedJDBCfor Database transactions.
  • Involved in unit testing of the application.
  • Developed stored procedures inOracle.
  • UsedTest Driven Developmentapproach, and wrote many unit and integration tests
  • Involved in analyzing how the requirements related to and depended on each other.
  • Onsite coordination for developing various modules.

Environment: Java 1.4, JSP 2.0, Servlets 2.4, JDBC, HTML, CSS, JavaScript, WebSphere 3.5.6, Eclipse, Oracle 9i.

We'd love your feedback!