Sr Hadoop Developer / Admin Resume
Santa Clara, CA
SUMMARY:
- 6+ years of work experience in IT, which includes 4+ years of experience in Installation, Development and Implementation of Hadoop.
- Experience in dealing with Apache Hadoop components like HDFS, MapReduce, HiveQL, HBase, Pig, Sqoop, Big Data and Big Data Analytics.
- Hands on experience in MapReduce jobs. Experience in installing, configuring and administrating the Hadoop Cluster of Major Hadoop Distributions.
- Hands on experience in installing, configuring and using echo system components like Hadoop, MapReduce, HDFS, Hbase, Zoo keeper, Hive, Sqoop and Pig.
- Experienced in writing complex MapReduce programs that work with different file formats like Text sequence and XML
- Installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache SPARK, Cloudera and AWS Service console
- Software developer in Java Application Development, Client/Server Applications, Internet/Intranet based database applications and developing, testing and implementing application environment using J2EE, JDBC, JSP, Servlets, Web Services, Oracle, PL/SQL and Relational Databases .
- Experience in design, development of web based applications using HTML, DHTML, CSS, JavaScript, JQuery, JSP and Servlets.
- Knowledge of distributed systems, test - driven development
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS.
- Experience in NoSQL Column-Oriented Databases like Hbase, Cassandra and its Integration with Hadoop cluster.
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster .
- Experience data processing like collecting, Aggregating Machine, moving from various sources using Apache Flume and Kafka.
- Good Hands-on experience on full life cycle implementation using CDH ( Cloudera) and HDP ( Hortonworks Data Platform) distributions
- Good experience in recognizing and reusing Design patterns, J2EE Design patterns, SOA Design patterns.
- Good work experience with Hibernate open source object/relational mapping framework.
- Good Work experience in the development of EJB (Entity, Session and Message Driven Beans), etc.
- Solid design skills using Java Design Patterns and Unified Modeling Language (UML).
- Experience in working with different operating systems Windows 98/NT/2000/XP, UNIX, LINUX.
- Good expertise using the Development tools like Eclipse.
- Sound Relational Database concepts and extensively worked with ORACLE, DB2. Very good in writing complex SQL queries and PL/SQL procedures.
- Possess excellent communication, interpersonal and analytical skills along with positive attitude.
TECHNICAL SKILLS:
Big Data: Hadoop, MapReduce, HDFS, Hive, Pig and Sqoop, Spark, Kafka, Cassandra
Web Technologies: J2EE, JSP, Servlets, Web Services, JDBC, MVC, JSTL, DOM, CSS, JQuery.
Frame Works: Spring, Struts, JSF, Hibernate.
Development Tools: Eclipse, WSAD 6.0,, ANT 1.7, Maven 2, Log4j, Rapid Application Developer, Dreamweaver 8
Languages: JavaDesign and Modeling UML and Rational Rose.
Databases: Oracle, MS SQL Server, MySQL, DB2, SQL/PLSQL
Version Control: CVS, SVN and Clear Case.
Environment: UNIX, Red Hat Linux, Windows 2000, Windows NT 4.0, Windows XP, Windows Vista, Windows 7, and Windows 8.
PROFESSIONAL EXPERIENCE:
Confidential, Santa Clara, CA
Sr Hadoop Developer / Admin
Responsibilities:
- Worked on the proof-of-concept for Apache Hadoop framework initiation.
- Installed and configured Hadoop, MapReduce, and HDFS.
- Developed multiple MapReduce jobs using Java API for data cleaning and preprocessing.
- Importing and exporting data into HDFS and HIVE from a Oracle 11g database using Sqoop
- Responsible to manage data coming from different sources
- Developing applications using MySQL for database design
- Monitoring the running MapReduce programs on the cluster.
- Responsible for loading data from UNIX file systems into HDFS.
- Installed and configured Hive and Pig.
- Populated HDFS and HBase with huge amounts of data using Apache Kafka
- Worked on Amazon AWS concepts like KINESIS, LAMBDA, EMR and EC2 web services for fast and efficient processing of Big Data.
- Experienced to implement Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3).
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Involved in integration of Hadoop cluster with Spark engine to perform BATCH and GRAPHX operations
- Experience in Upgrading hadoop cluster Hbase/Zookeeper from CDH3 to CDH4.
- Developed Python scripts to copy data between the clusters. The Python script that is developed for the copy enables to copy huge amount of data very fast.
- Experience working on Talend ETL for performing data migration and data synchronization processes on the data warehouse.
- Developed multiple POCs using Spark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata
- Create data pipelines in cloud using Azure Data Factory
- Wrote Pig scripts to process unstructured data and create structure data for use with Hive.
- Performed benchmarking of the No-SQL databases, Cassandra and Hbase streams.
- Developed scripts and automated data management from end to end and sync up b/w all the clusters.
Environment: s: Apache Hadoop, MapReduce, HDFS,Spark Java (jdk1.6), Oracle 11g/10g, mySQL, Windows, UNIX, Sqoop, Hive, Yarn, Pig, No-Sql, Zookeeper, Casandra, Kafka, AWS, Azure, Hotnworks, ETL CloudEra and Python.
Confidential, NYC, NY
Hadoop Admin / Developer
Responsibilities:
- Developed Map Reduce jobs in java for data cleansing and preprocessing.
- Moving data from Oracle to HDFS and vice-versa using SQOOP.
- Developed Spark code and Spark- SQL /Streaming for faster testing and processing of data
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- ETL off-loading from Teradata to Hadoop; Real time streaming the data using Spark with Kafka
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Hands on experience with Apache Spark using Scala . Implemented spark solution to enable real time report from Cassandra data.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Developed Python scripts, UDF are using both Data frames/SQL and RDD/Map Reduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop
- Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
- Implemented partitioning, bucketing in Hive for better organization of the data.
- Performed benchmarking of the No-SQL databases, Cassandra and Hbase streams.
- Experience in automation of code deployment across multiple cloud providers such as Amazon Web Services , Microsoft Azure , Google Cloud, VMWare and OpenStack
- Worked with different file formats and compression techniques to determine standards
- Developed Hive queries and UDFS to analyze/transform the data in HDFS.
- Developed Hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Developed Pig scripts and UDF’s as per the Business logic.
- Analyzing/Transforming data with Hive and Pig.
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
- Designed and developed read lock capability in HDFS.
- Implemented Hadoop Float equivalent to the Oracle Decimal.
- Used Cloudera Manager for continuous monitoring and managing the Hadoop cluster.
- Managing and Supporting Infrastructure
- Monitoring and Debugging Hadoop jobs/Applications running in production.
- Worked on Providing User support and application support on Hadoop Infrastructure.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helped and directed testing team to get up to speed on Hadoop Application testing.
- Worked on Installing 20 node UAT Hadoop cluster.
Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, HDFS, LINUX, Oozie, Cassandra, Hue, HCatalog, Zookeeper, Java, Eclipse, VSS, Red Hat Linux, python, Cassandra, ETL, No Sql, AWS, Kafka, VMwarev, Azure
Confidential, San Jose, CA
Big Data / Hadoop Consultant
Responsibilities:
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved in installing Hadoop Ecosystem components.
- Used to manage and review the Hadoop log files.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs that are running on the cluster.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Wrote MapReduce jobs using Java API.
- Installed and configured Pig and also written PigLatin scripts.
- Involved in managing and reviewing Hadoop log files.
- Imported data using Sqoop to load data from MySQL and Oracle to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Written Hive queries for data analysis to meet the business requirements.
- Creating Hive tables and working on them using Hive QL.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Used JUnit for unit testing and Continuum for integration testing.
Environment: Cloudera, Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Java, Red Hat Linux, XML, MySQL, Eclipse, JUnit
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, Design, Analysis and Code development.
- Developed and designed the front end using HTML, CSS and JavaScript and Ajax.
- Developed the entire application implementing MVC Architecture integrating Hibernate and Spring frameworks.
- Involved in development of presentation layer using JSP and Servlets with Development tool Eclipse IDE.
- Worked on development of Hibernate, including mapping files, configuration file and classes to interact with the database.
- Used XML/XSL and Parsing using both SAX and DOM parsers.
- Implemented web services with Apache Axis.
- Designed and developed EJBs to handle business logic and store persistent data.
- Planned and implemented various SQL, Stored Procedure, and triggers.
- Written JUnit Test cases for performing unit testing.
- Responsible for testing, debugging, bug fixing and documentation of the system.
- Application deployment is done in Tomcat server.
Environment: Java, Servlets, JSP, JQuery, JUnit, Hibernate 3.2, JPA 2.0, Spring 2.5, Ajax, Oracle 10g, JMS, Eclipse 3.4, Apache Ant, Tomcat, Web Services, Apache Axis 2.0, WebSphere 6.1, JavaScript, HTML, CSS, XML, ClearCase.