Hadoop Developer Resume
OBJECTIVE:
Seeking a challenging role in the field of IT Industry as Technical Architect/Developer/Quality Assurance Analyst to contribute towards organizational success and to reach higher echelons.
SUMMARY:
- Over 8 years of diversified IT professional experience including 2 year of experience in implementing Bigdata solutions using Cloudera Apache Hadoop distribution system.
- Worked on 34 node Hadoop Cloudera cluster CDH 5.2 for SCPP EVO LRI EQ project .
- Spark for real time application queries and EOD batches .
- Expertise in Hadoop architecture and its various components - Hadoop Distributed File System (HDFS), MapReduce, Name node, Data Node, Job Tracker, Task Tracker, Secondary Name Node.
- Good understanding of Hadoop MapReduce programming paradigm.
- Good Knowledge on Hadoop Cluster architecture and monitoring.
- Experience writing queries in HIVE, PIG through command line shell.
- Experience in managing and reviewing Hadoop log files.
- Strong understanding of Hadoop eco system components such as HDFS, Map Reduce, Sqoop, Flume, Oozie, Pig, Hive, HBase, and Zookeeper.
- Proficiency in Java, Hadoop Map reduce, Pig, Hive, Hbase, Sqoop, Flume, Scala, Spark, Kafka, Strom, Oozie and Impala.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the Bigdata as per the requirement.
- Working knowledge on major Hadoop ecosystems PIG, HIVE, HBASE and Cloudera Manager.
- Experience in developing PIG Latin Scripts and using Hive Query Language.
- Experience working on NoSQL databases including Cassandra and Hbase.
- Experience developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.
- Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop & Flume.
- Hands-on experience developing workflows that execute MapReduce, Sqoop, Flume, Hive and Pig scripts using Oozie.
- Well-versed database development knowledge using SQL data types, Indexing, Joins, Views, Transactions, Large Objects and Performance tuning.
- Good knowledge of Data warehousing concepts and ETL and Teradata.
- Experience writing Shell scripts in Linux OS and integrating them with other solutions.
- Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Collections, Data-structures and Serialization.
- Expertise in using automation testing tools like HP Quick Test Professional (QTP), and Load Runner.
- Strong knowledge in Database Programming using RDBMS databases like SQL Server 7.0/2000/2005 , Oracle 7.0/8/8i/9i and MS Access. Expertise in writing PL/SQL Queries, Stored Procedures, Triggers, Packages, Cursors…etc
- Good knowledge in Quality Assurance Life Cycle (QALC), Software Development Life Cycle (SDLC), Software Test Life Cycle (STLC), Object Oriented Analysis and Design (OOAD).
- Excellent Analytical skills to understand the business process and functionality, requirements and to translate them to system requirement specifications
- Experience in preparing Test plans, Test Data and execution of Test cases to ensure application functionality meet the user requirements.
SKILL:
Hadoop Ecosystem: HDFS, MapReduce, Sqoop, Flume, Hive, Pig, HBase, YARN, Oozie, Impala, Zookeeper, Kafka, Cloudera Manager, Spark
Hadoop Distributions: Apache Hadoop, CDH3, CDH4, Hortonworks.
Programming Languages: Core Java, C, HTML, Visual Basic, ASP.NET, .NET, ADO.NET, XML, Scala
Scripting Languages: Unix/Linux Shell Scripting, Java Script, VB Scripting, Python
Automation/ETL Tools: HP Quick Test Pro, iMacros, Selenium, Ab Initio, Excel Macro.
IDE/Tools/Utilities: Eclipse IDE, MS Visual Studio 2010, Control M, Tivoli.
Methodologies: UML, OOP and Agile-Scrum.
Databases Technologies: Oracle 10g,11g, MS SQL Server, Teradata and Data wharehouse
NoSQL Databases: HBase and Cassandra
Application/Web Servers: Apache, Tomcat, MSIIS, Splunk
Version Control Tools: Tortoise CVS Client, SVN, MS Team Foundation Server (TFS).
Defect Tracking Tools: Test Director, HP Quality Center, Jira, HP ALM.
Operating Systems: LINUX/UNIX, Windows 7, Windows Server 2003/2008
EXPERIENCE:
Hadoop Developer
Confidential
Responsibilities:
- Performed Planning, installing, configuring, maintaining, and monitoring Hadoop Clusters and using Apache Cloudera (CDH4, CDH5) distributions
- Worked on Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade
- Setting up Data Ingestion tools like Flume, Sqoop, SFTP and NDM.
- Install and Set up HBASE
- Developed automated scripts using Unix Shell for running Balancer, file system health check, Schema Creation in Hive and User/Group creation on HDFS.
- Application Development and Providing solutions to business requirements
- Adding and Decommissioning Hadoop Cluster nodes Including Balancing HDFS block data.
- Set up Quotas on HDFS, implementing Rack Topology Scripts.
- Managed and reviewed Hadoop log files, File system management and monitoring, Hadoop Cluster capacity planning
- Configuring Sqoop and Exporting/Importing data into HDFS
- Cluster maintenance including adding and removing cluster nodes; cluster Monitoring and Troubleshooting
- Involved in log file management where the Hadoop logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 2 years for Audit purpose.
- Worked on Sqoop API, created a version of sqoop for CDS Distribution with lot of customized features
- Collaborate with cross-functional teams to ensure that applications are properly tested, configured, and deployed.
- Integration of Hadoop Connectors to existing sqoop for various databases
- Provided solutions where the data was Streamlined
- Used Compression and encryption technologies to process data before storing it to HDFS
- Successfully moved data from one DB to another by landing files in HDFS
- Worked with GPFS, Hive, Exacta, MS Sql Server, Teradata
- For scheduling jobs in HDFS used Oozie.
- Wrote multiple MR jobs are various requirements and to solve purposes
Environment: Java, Rest full Services, Hadoop, Map Reduce, Hive, HBase, Sqoop, Junit, Oracle, Teradata, Greenplum, TDCH, AbInitio, Control-M, Oozie, Oracle Hadoop Connectors and Tableau
Confidential
Hadoop DeveloperResponsibilities:
- Responsible for architectingHadoopclusters with CDH3
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4
- Installed cluster, worked on commissioning & decommissioning of Datanode, Namenode recovery, capacity planning, and slots configuration.
- Developed automated scripts using Unix Shell for running Balancer, file system health check, Schema Creation in Hive and User/Group creation on HDFS.
- Application Development and Providing solutions to business requirements
- Adding and Decommissioning Hadoop Cluster nodes Including Balancing HDFS block data.
- Set up Quotas on HDFS, implementing Rack Topology Scripts.
- Managed and reviewed Hadoop log files, File system management and monitoring, Hadoop Cluster capacity planning
- Involved in log file management where the Hadoop logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 2 years for Audit purpose.
- Creating various Mapreduce jobs for performing ETL transformations on the transactional and application specific data sources.
- Configured Flume to ingest trade data into the HBase database from various JMS Source (MQ)
- Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive and vice versa.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Processed the output from PIG, Hive and formatted it before sending to theHadoopoutput file.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance
- Setup and benchmarkedHadoop/HBase clusters for internal use
- Wrote and executed PIG scripts using Grunt shell
- Installed and configuredHadoop, Map Reduce, HDFS.
- Used Hive QL to do analysis on the data and identify different correlations.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Installed and configured Pig and also written Pig Latin scripts
- Developed UDFs in Java as and when necessary to use in pig and hive queries
- Used Flume to collect the logs data with error messages across the cluster.
- Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
- Good understanding of Partitions, Bucketing concepts in hive to optimize performance
- Developed and scheduled Autosys job for EOD process
- ManagedHadoopclusters include adding and removing cluster nodes for maintenance and capacity needs.
- Experience in monitoring and managing theHadoopcluster using Cloudera Manager
- Actively updated the upper management with daily updates on the progress of project that include the classification levels that were achieved on the data.
Environment: Hadoop, MapReduce, Java, Flume, Sqoop, Hbase, Hive, Pig, Autosys Scheduler, Oracle, Shell Scripting, NOSQL, XML, Cloudera Manager
