Sr Hadoop Developer Resume
Madison, Wi
SUMMARY:
- Over Eight years of extensive IT experience with multinational clients which includes 3 years of Hadoop related architecture experience developing Big data / Hadoop applications.
- Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Pig, Hive, HBase, Flume, Oozie and Zookeeper)
- Well versed in configuring and administrating the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera
- Experience on IBM InfoSphere BigInsights and Cloudera Distributions
- Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig .
- Experienced with performing real time analytics on NoSQL data bases like HBase, MongoDB and Cassandra .
- Experienced with ETL to load data into Hadoop/NoSQL
- Experienced with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses
- Worked with Oozie work flow engine to schedule time based jobs to perform multiple actions.
- Experienced in importing and exporting data between RDBMS and Tera Data into HDFS using Sqoop
- Analyzed large amounts of data sets writing Pig scripts and Hive queries
- Logical Implementation and interaction with HBase
- Experienced in writing MapReduce programs &UDFs for both Hive & Pig in Java
- Used Flume to channel data from different sources to HDFS.
- Experience with configuration of Hadoop Ecosystem components: Hive, HBase, Pig, Sqoop, Mahout, Zookeeper and Flume.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe like JSON and Avro.
- Supported MapReduce Programs running on the cluster and wrote custom MapReduce Scripts for Data Processing in Java.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying
- Good knowledge in Apache Crunch and Hadoop HDFS Admin Shell commands.
- Experience with Testing Map Reduce programs using MRUnit, JUnit and Easy Mock.
- Experienced with implementing Web based, Enterprise level applications using J2EE frameworks like Spring, Hibernate, EJB, JMS, JSF and Java.
- Experience with web - based UI development using JQuery, CSS, HTML5, XHTML
- Experienced with implementing/consumed SOAP Web Services using Spring CXF and Consumed Rest Web Services using Http Clients.
- Experienced in writing functions, stored procedures, and triggers using PL/SQL.
- Experienced with build tool ANT, Maven and continuous integrations like Jenkins.
- Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile methodologies
- Motivated team player with excellent communication, interpersonal, analytical, problem solving skills and zeal to learn new technologies.
- Highly adept at promptly and thoroughly mastering new technologies with a keen awareness of new industry developments and the evolution of next generation programming solutions.
TECHNICAL SKILLS:
Hadoop Ecosystem: Hadoop, MapReduce, Spark, Sqoop, Tera data, Hive, Oozie, PIG, HDFS, Zookeeper, Flume
Languages: C, C++, JAVA, Python, PHP, SQL, PL/SQL, PIG Latin, HiveQL, Unix shell scripting
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, JUnit, MRUnit
No SQL Databases: HBase, Cassandra and MongoDb
Databases: Oracle 11g/10g/9i, My SQL, Teradata, DB2, MS SQL Server
Application Server: Apache Tomcat, JBoss, IBM Web sphere, Web Logic
Web Services: WSDL, SOAP, Apache CXF, Apache Axis, REST, Jersey
Methodologies: Scrum, Agile, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Madison, WI
Sr Hadoop Developer
Responsibilities:
- Involved in Installing, Configuring Hadoop Eco System, Cloudera Manager using CDH4 Distribution.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
- Imported data using Sqoop from Tera data using Tera data connector.
- Integrated Quartz scheduler with Oozie work flows to get data from multiple data sources parallel using fork.
- Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Familiarity with a NoSQL database such as MongoDb.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Worked Big data processing of clinical and non clinical data using Map Reduce.
- Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Worked on implementing SPARK with SCALA
- Responsible for importing log files from various sources into HDFS using Flume
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats
- Experienced with different kind of compression techniques like LZO, GZip, Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment:: Hadoop, HDFS, HBase, MongoDb, Spark, MapReduce, Tera Data, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, MySQL.
Confidential, Austin, TX
Hadoop Developer
Responsibilities:
- Worked on importing data from various sources and performed transformations using MapReduce, Hive to load data into HDFS
- Setup, maintain and upgrade BigInsights hadoop cluster
- Responsible for building scalable distributed data solutions using Hadoop
- Written various Hive and Pig scripts
- Created HBase tables to store variable data formats coming from different portfolios
- Performed real time analytics on HBase using Java API and Rest API.
- Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
- Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
- Worked on setting up Pig, Hive and HBase on multiple nodes and developed using Pig, Hive, HBase and MapReduce
- Worked on compression mechanisms to optimize MapReduce Jobs
- Analyzed the customer behavior by performing click stream analysis and to ingest the data used flume.
- Experienced with working on Avro Data files using Avro Serialization system.
- Solved small file problem using Sequence files processing in Map Reduce.
- Implemented business logic by writing UDF's in Java and used various UDF's from Piggybanks and other sources.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Worked on Oozie workflow to run multiple jobs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment:: Horton works, IBM InfoSphere BigInsights, Map Reduce, HBase, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse
Confidential, San Ramon, CA
Hadoop Developer
Responsibilities:
- Involved in conceptual, logical and physical data modeling and used star schema in designing the data warehouse
- Importing and exporting data into HDFS from database and vice versa using SQOOP
- Helped this medical group streamline business processes by developing, installing and configuring Hadoop ecosystem components that moved data from individual servers to HDFS.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Created Cassandra tables using CQL to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics
- Supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Designed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment:: Cloudera, Map Reduce, Cassandra, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse
Confidential, Phoenix, AZ
Java Programmer
Responsibilities:
- Used Rational Rose for Use Case Diagram, Class Diagrams, Sequence diagrams and Object diagrams in design phase.
- Involved in creation of UML diagrams like Class, Activity, and Sequence Diagrams using modeling tools of IBM Rational Rose
- Involved in the full life cycle development of the modules for the project.
- Used Eclipse IDE for application development.
- Used Spring framework for dependency injection.
- Worked with Spring AOP for transaction and logging.
- Used Struts (MVC) for developing presentation layer.
- Used JBoss application server for deploying applications.
- Used SOAP XML Web services for transferring data between different applications.
- Developed web services using top down approach from WSDL to Java.
- Used MVC design pattern for designing application, JSP as the view component.
- Persistence layer was implemented using Hibernate Framework. Integrated Hibernate with spring framework.
- Worked with complex SQL queries, SQL Joins and Stored Procedures using TOAD for data retrieval and update.
- Used JUnit for performing unit testing.
- Used Log4J to capture the logs that included runtime exceptions.
Environment: Eclipse, Web Services, UML, Struts (MVC), Hibernate, spring, JSP, WSDL, JMS, Rational Rose, JavaScript, Junit, PL/SQL, Oracle 10G, SVN
Confidential
JAVA/J2EE Developer
Responsibilities:
- Developing light weight business component and integrated applications using struts
- Designing and developing front-end, middleware and back-end applications.
- Optimizing server/client side validation.
- Transfer old Perl scripts into new Python scripts, add new functions and features. Develop automated test method and documentations for these scripts.
- Worked together with the team in helping transition from Oracle to DB2.
- Developed the global logging module which was used across all the modules using Log4Jcomponents.
- Developed the presentation layer for the credit enhancement module in JSP.
- Struts were used to implement the Model View Layer (MVC) architecture. Validations were done on the client side as well as the server side.
- Involved in the configuration management using ClearCase.
- Detecting and resolving errors/defects in the quality control environment.
- Using Ibatis for mapping Java classes with database.
- Involved in Code review and integration testing.
- Used Debugging tools such as PMD, Find Bugs and checkstyle.
Environment:: Java v1.6, J2EE 6, Struts 1.2, iBatis, XML, JSP, CSS, Python, HTML, JAVASCRIPT, JQuery, Oracle 10g, DB2, Unix, RAD, ClearCase, WebSphere V8.0 (beta)