Senior Hadoop Developer Resume
New York, NY
SUMMARY:
- Over 7.5+ years of professional experience in IT industry, with 3.5+ years of hands - on expertise in Big Data processing using Hadoop, Hadoop Ecosystem implementation, maintenance, ETL and Big Data analysis operations.
- Excellent understanding of Hadoop architecture and underlying frameworks including distributed storage (HDFS), resource management (YARN) and MapReduce programming paradigm.
- Highly experienced in using various Hadoop Ecosystem projects such as MapReduce, Pig, Hive, ZooKeeper, HBase, Sqoop, Flume and Oozie for ingestion, storage, querying, processing and analysis of Big Data.
- Hands-on expertise in writing MapReduce jobs using Java.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- Knowledge of architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Experience in managing Hadoop clusters and services using Cloudera Manager.
- Collected logs data from various sources and integrated in to HDFS using Flume.
- Knowledge of project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Experience in analyzing data using Pig and Hive and also developing custom User Defined Functions to in corporate methods and functionality of Java into PigLatin and HiveQL.
- Experience in using ZooKeeper distributed coordination service for High-Availability.
- Good knowledge of using Oozie Workflow Scheduler system for MapReduce jobs.
- Good understanding of the architecture and functionality of NoSQL database ( HBase).
- Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
- Proficient knowledge of collecting logs data from various sources and integrating into HDFS using Flume.
- Extensive experience in Hadoop Administration activities such as installation, configuration, maintenance and deployment of Big Data components and the underlying infrastructure of Hadoop cluster.
- Experience in Hadoop cluster planning, monitoring and troubleshooting.
- Hands-on expertise on multi node cluster setup using Apache Hadoop and Cloudera Hadoop distributions.
- Experience in managing and monitoring of Hadoop cluster and services using Cloudera Manager.
- Experience in optimizing the configurations of MapReduce, pig and hive jobs for better performance.
- Red Hat certified System Administrator on Red Hat Enterprise Linux 7.
- Good working knowledge in Puppet for configuration management.
- Experience in working with Java /J2EE environment as a developer.
- Well versed in Object Oriented Programming and Software Development Life Cycle.
- In-depth understanding of Data Structures and Algorithms and Optimization.
- Extensive experience with Databases such as Oracle 10g/11g, Microsoft SQL Server, MySQL.
- Experience in writing complex SQL queries, PL/SQL Procedures, Functions and Triggers.
TECHNICAL SKILLS:
Hadoop and Big Data Technologies: Apache Hadoop, Cloudera CDH4/CDH5, HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Flume, ZooKeeper, Oozie, HBase, Cloudera Manager, Hue
Databases: Oracle 10g/11g, Microsoft SQL Server, MySQL, PostgreSQL, MS Access
Programming Languages: C, C++, Java
Scripting & Query Languages: Bash Shell Scripting, SQL, PL/SQL
Java/Web Technologies: J2EE, Spring, Hibernate, JSP, Servlets, JMS, HTML, CSS, XML, Javascript, AJAX
Operating Systems: Linux(Red Hat, CentOS, Ubuntu ), Windows
Linux Servers: DNS, DHCP, FTP, NFS, NIS, LDAP, SAMBA, SQUID, NTP, Apache Webserver
Storage Management: Disk Partitioning, SWAP Partitioning, iSCSI Storage, LVM, Disk Quota, RAID
Securities: Firewalld, IPTables, Kerberos, SSL/TLS, SELinux, SSH, TCP Wrappers
Virtualizations and Cloud: VirtualBox, VMware Workstation, Amazon EC2
Networking: TCP/IP, Ethernet Networking, IP Addressing, IP Subnetting
Monitoring/Troubleshooting: Nagios core, Ganglia, WhatsUp Gold, Top, Netstat, IPTraf, Tcpdump, nmap, ps
Configuration Management: Puppet
Development Tools: Eclipse, NetBeans, Oracle SQL Developer, SQL *Plus Instant Client
Other Tools: Microsoft Visio, Crystal Reports, PuTTy
PROFESSIONAL EXPERIENCE:
Confidential, New York, NY
Senior Hadoop Developer
RESPONSIBILITIES:
- Worked in a team that built big data analytic solutions using Cloudera Hadoop Distribution.
- Migrated RDBMS (Oracle, MySQL) data into HDFS and Hive using Sqoop.
- Extensively used Pig scripts for data cleansing and optimization.
- Worked on creating and optimizing Hive scripts based on business requirements.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Involved in loading data from UNIX file system to HDFS.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Used Map Reduce JUnit for unit testing.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Loaded and transformed large sets of structured, semi structured and unstructured data.
- Exported the result set from HIVE to MySQL using Shell scripts.
- Used Zookeeper for various types of centralized configurations.
- Involved in maintaining various Unix Shell scripts.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Created and managed Hive external tables in Hadoop for Data Warehousing.
- Developed customized UDFs in Java for extending Pig and Hive functionality.
- Extracted the needed data from the server into HDFS and Bulk-Loaded the cleaned data into HBase.
- Integrated the Hive Warehouse with HBase.
- Configured Oozie to run multiple Hive and Pig jobs which run independently with time and data availability.
- Worked on optimization and performance tuning of MapReduce, Hive & Pig Jobs.
ENVIRONMENT: CDH5, HDFS, MapReduce, Pig, Hive, Sqoop, HBase, Oozie, Hue, Java, Linux
Confidential - New York, NY
Hadoop Developer
RESPONSIBILITIES:
- Worked on implementation and maintenance of Cloudera Hadoop Cluster.
- Monitored workload, job performance and node health using Cloudera Manager.
- Involved in setting up High-availability Hadoop cluster using Quorum Journal Manager (QJM) and ZooKeeper.
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Loaded and transformed large sets of structured, semi structured and unstructured data.
- Responsible for managing data coming from different sources.
- Gained good experience with NoSQL database.
- Involved in creating Hive tables, loading with data and writing Hive queries, which will run internally in map, reduce way.
- Proactively monitored and handled systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, disaster recovery systems & procedures and Hadoop log files.
- Assisted in configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Imported data from RDBMS (MySQL, Teradata) to HDFS and vice versa using Sqoop (Big Data ETL tool) for Business Intelligence, visualization and report generation.
- Used Flume to collect and aggregate web log data from different sources and pushed to HDFS.
- Developed and executed custom MapReduce programs, PigLatin scripts and HiveQL queries.
- Developed and optimized Pig and Hive UDFs to implement the methods and functionality of Java as required.
- Analyzed user request patterns and implemented various performance optimization measures including but not limited to implementing partitions and buckets in HiveQL.
ENVIRONMENT: CDH4, HDFS, YARN, MapReduce, Pig, Hive, Sqoop, Flume, ZooKeeper, Nagios, Java, Linux, Shell Scripting
Confidential - New York, NY
Big Data Consultant / Hadoop Consultant
RESPONSIBILITIES:
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Collaborated with the systems engineering team to plan and deploy hardware and software environments required for new Hadoop environments and to expand existing Hadoop cluster.
- Logical Implementation and interaction with HBase.
- Used Fast script alternative to scoop to automate transfer of data from oracle to HBase.
- Performed data analysis using Hive and Pig.
- Loaded log data into HDFS using Flume.
- Worked with application teams to install Hadoop updates, patches, version upgrades as required
- Installed and configured various components of Hadoop Ecosystem and maintained their integrity.
- Responsible for cluster maintenance, commissioning and decommissioning of DataNodes, cluster monitoring, troubleshooting, managing of data backups and disaster recovery systems, analyzing Hadoop log files.
- Configured Hive using shared meta-store in MySQL and used Sqoop to migrate data into External Hive Tables from different RDBMS sources (Oracle, Teradata and DB2) for Data warehousing .
- Developed Hive queries and custom UDFs to bring data into a structured format.
- Installed and implemented monitoring tools like Ganglia and Nagios to monitor Hadoop cluster environment.
ENVIRONMENT: Hadoop 1x, HDFS, MapReduce, Pig, Hive, Sqoop, Nagios, Ganglia, MySQL, Linux, NFS, Shell scripting
Confidential - Whitehouse Station, NJ
Java/J2EE Developer - Consultant
RESPONSIBILITIES:
- The entire application was developed in J2EE using Spring MVC based architecture.
- Involved in design and interacted with business intelligence team during requirement analysis.
- Involved in developing Data models and Database schemas.
- Used Hibernate ORM framework for data persistence layer.
- Involved in creating the Hibernate POJO Objects and mapped using Hibernate Annotations.
- Developed user interface using JSP, HTML, XML and CSS to simplify the complexities of the application.
- Used JMS for asynchronous messaging between different modules.
- Used WebSphere Application Server to develop and deploy the application.
- Actively participated in client meetings and taking the inputs for the additional functionality.
- Actively involved in code review and bug fixing for improving the performance.
- Involved in coding for JUnit Test cases.
ENVIRONMENT: Java/J2EE, spring, Hibernate, JSP, Servlets, JMS, HTML, XML, CSS, SQL, PL/SQL, Oracle 11g, WebSphere
Confidential - New York, NY
Java/J2EE Developer - Consultant
RESPONSIBILITIES:
- Involved in various phases of development life cycle of the project and participated in daily SCRUM meetings.
- Participated in JAD meetings to gather the requirements and understand the End-user Systems.
- Created detailed design documents using UML (Use case and Sequence diagrams).
- Implemented MVC design pattern using Spring Framework.
- Implemented the functionality of mapping entities to the database using Hibernate.
- Used Spring framework to define beans for Entity and corresponding dependent services.
- Used HTML, CSS, XML, JavaScript and JSP for interactive cross browser functionality and complex user interface.
- Written queries, stored procedures and functions using SQL, PL/SQL in Oracle.
- Involved in Bug fixing of various modules that were raised by the testing teams in the application.
- Re-designed core framework and database access by replacing JDBC and using Spring and Hibernate.
- Worked on performance tuning at different level starting from database level moving up to application level.
ENVIRONMENT: : Java/J2EE, spring, Hibernate, JSP, Servlets, HTML, CSS, JavaScript, XML, PL/SQL, Oracle 10g, WebLogic
Confidential, Wayne, NJ
Oracle Developer - Consultant
RESPONSIBILITIES:
- Designed, developed and maintained Oracle database schemas, tables, standard views, materialized views, synonyms, indexes, constraints, sequences, cursors and other database objects.
- Wrote Stored Procedures, Functions and Triggers using PL/SQL to implement business rules and processes.
- Created and modified existing functions and procedures based on business requirements.
- Involved in developing ER Diagrams, Physical and Logical Data Models using Microsoft Visio.
- Written complex SQL using joins, sub-queries and correlated sub-queries.
- Used Analytic Functions to simplify and handle complex query problems in SQL.
- Populated tables from the text files using SQL *LOADER.
- Managed privileges on tables and other objects for outside schema users.
- Implemented Table Partitioning and Sub-Partitioning to improve performance and data management.
- Worked with business users to gather requirements for developing new reports.
- Understood the future requirements and database model concepts accordingly.
- Tuned and optimized complex SQL queries.
ENVIRONMENT: Oracle 10g, Windows, SQL, PL/SQL, SQL Developer, SQL*Loader, Microsoft Visio, Crystal Reports