Hadoop Developer Resume
Chicago, IL
SUMMARY
- 7 years of professional IT experience which includes more than 3 years of experience in Big data ecosystem.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Zookeeper, Pig, and Flume.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom MapReduce programs in Java.
- Extending Hive and Pig core functionality by writing customUDFs.
- Experience in building Data pipelines using Kafka and Spark.
- Experience in data management and implementation of Big Data applications using Hadoop frameworks.
- Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Strong experience as a Java Developer in Web/intranet, client/server technologies using Java, J2EE, Servlets, JSP, JSF, EJB, JDBC and SQL.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Proficient in UNIX Shell scripting.
- Excellent working knowledge of popular frameworks like Struts, Hibernate, and Spring MVC.
- Experience in Agile Engineering practices.
- Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS
Big Data Technologies: Hadoop, HDFS, Hive, Spark, Map Reduce, Pig, Sqoop, Flume, Zookeeper, oozie
Scripting Languages: JavaScript, HTML, XML, UNIX shell scripting
Programming Languages: C, C++, Java
Java/J2EE Technologies: Java, Java Beans, J2EE (JSP, Servlets, EJB), Struts, spring, JDBC.
DB Languages: SQL, PL/SQL
NoSQL Databases: Hbase, MongoDB
Operating Systems: LINUX, UNIX and Windows Variants
PROFESSIONAL EXPERIENCE
Confidential, Chicago IL
Hadoop Developer
Responsibilities:
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Involved in managing and reviewing Hadoop log files.
- Created Hive Generic UDF's, UDAF's, UDTF's to process business logic that varies based on policy.
- Developed PIG Latin scripts for the analysis of semi structured data.
- Worked extensively with Sqoop for importing data from Oracle to HDFS.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Created and exposed Hive views through Impala for the business Users
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Responsible for handling disk failures.
- Reconfigured YARN and Hive memory parameters for the cluster.
- Experienced using different data file format AVRO, Parquet
- Orchestrated sqoop scripts, pig scripts, hive queries using Oozie workflows.
- Worked on Data Lake architecture to build a reliable, scalable, analytics platform to meet batch, interactive and on-line analytics requirements.
- Worked on Performance Tuning of Hadoop jobs by applying techniques such as MapSide Joins, Partitioning, Bucketing
- Hands on experience in developing applications using Spark, Spark Sql and Spark Streaming.
- Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Dataframes API to load structured and semi structured data into Spark Clusters
- Involved in requirement and design phase to implement Streaming Architecture to use real time streaming using Spark and Kafka.
- Integrated Tableau with Hadoop data source for building dashboard to provide various insights on sales of the organization.
- Developed Map Reduce programs using Java to perform various transformations, cleaning and scrubbing tasks.
Environment: Linux, Cloudera, Hadoop, Splunk, Hive, PIG, Oozie, Hue, mySql, Oracle, Sqoop, Puppet, Impala, Spark, Oozie
Confidential, Jacksonville FL
Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Responsible for importing log files from various sources into HDFS using Flume.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Created POC to store Server Log data in MongoDB to identify System Alert Metrics.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Experience in Upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
Environment: Hadoop, HDFS, HBase, MapReduce, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, MySQL, MongoDB
Confidential, TX
Hadoop Developer
Responsibilities:
- Understand customer base, buying habits, promotional effectiveness, inventory management, buying decisions, by gathering and analyzing data log files through mapred jobs.
- Hadoop installation, configuration of multiple nodes in AWS-EC2 & Cloudera platform.
- Setup and optimize Standalone-System/Pseudo-Distributed/Distributed Clusters.
- Analyzing data with Hive, Pig and Hadoop Streaming.
- Build/Tune/Maintain Hive QL and Pig Scripts for reporting purpose.
- Help develop Mapreduce programs and define job flows.
- Manage and review Hadoop log files.
- Support/Troubleshoot mapreduce programs running on the cluster
- Load data from Linux/UNIX file system into HDFS.
- Load log data into HDFS using Flume.
- Install and configure Hive and write Hive UDFs.
- Create tables, load data, and write queries in Hive.
- Develop scripts to automate routine DBA tasks using Linux/UNIX Shell Scripts/Python (i.e. database refresh, backups, monitoring etc.)
- Tune/Modify SQL for batch and online processes.
- Manage cluster through performance tuning and enhancement.
Environment: Hadoop (HDFS), MapReduce, AWS, Hive, Java (JDK 1.6), Flat/XML/JSON Files, PSQL, Linux/UNIX Shell Scripting
Confidential
Database Developer
Responsibilities:
- Responsible for extracting data from EXCEL, FLAT files to SQL server using SQL Server Integration server SSIS.
- Created databases and schema objects including tables, indexes and applied constraints, connected various applications to the database and written functions, stored procedures and triggers.
- Responsible for creating database users and granting permissions on various tables.
- Responsible for analyzing data and writing complex joins and nested queries to get the relevant information.
- Created and modified SQL Server stored procedures.
- Scheduled reports for daily, weekly, monthly reports for executives, customer representatives for various categories and regions based on business needs using SQL Server Reporting services (SSRS).
- Optimized queries for reports which take longer time in execution with SQL Server 2008/2005.
Environment: SQL, SQL server Management 2008, R2 Business Intelligence Development Studio, Visual Studio
Confidential
JAVA/J2EE Developer
Responsibilities:
- Involved in different phases of Software Development Lifecycle (SDLC) of the application, like requirements gathering, analysis, design, development and deployment of the application.
- Model View Control (MVC) design pattern was implemented with Struts MVC, Servlets, JSP, HTML, AJAX, JavaScript, CSS to control the flow of the application in the Presentation/Web tier, Application/Business layer (JDBC) and Data layer (Oracle 10g)
- Analysis, Design, and Implementation of software applications using Java, J2EE, XML and XSLT.
- Developed Action Forms and Controllers in Struts 2.0/1.2 framework.
- Utilized various Struts features like Tiles, tagged libraries and Declarative Exception Handling via XML for the design.
- Created XML Schema, XML template and used XML SAX/DOM API to parse them.
- Implemented design patterns like Data Access Objects (DAO), Value Objects/Data Transfer Objects (DTO), Singleton etc.
- Developed JavaScript validations on order submission forms.
- Designed, developed and maintained the data layer using Hibernate.
- JUnit is used to do the Unit testing for the application.
- Used Apache Ant to compile java classes and package into jar archive.
- Used Clear Quest to keep track of the application bugs as well as to coordinate with the Testing team.
- Involved in tracking and resolving defects, which arise in QA & production environment