Hadoop Developer/admin Resume
New York New, YorK
SUMMARY
- Over 6 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem, Java and Software Functional Testing
- Over 3 years of experience in Hadoop infrastructure which include MapR, Hive, Oozie, Sqoop, HBase,
- Good experience as Software Engineer with IT Technologies and good working knowledge in Java and BIG Data Hadoop Ecosystems.
- Good experience in Hadoop infrastructure which include MapR, Hive, Oozie, Sqoop, HBase, Pig, HDFS, Yarn, Spark. Impala configuration projects in direct client facing roles.
- Good knowledge on Data Structure, Algorithms, Object Oriented Design and Data Modelling
- Strong knowledge in Core Java programming using Collections, Generics, Exception handling, multithreading.
- Good knowledge on Data Warehousing, ETL development, Distributed Computing, and large scale data processing.
- Good knowledge on implementation and design of big data pipelines.
- Knowledge in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Knowledge in implementing ETL/ELT processes with MapReduce, PIG, Hive
- Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase & Hive Integration, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP
- Worked on NoSQL databases including HBase, Cassandra and MongoDB.
- Strong knowledge on creating and monitoring Hadoop cluster on VM, Hortonworks Data Platform 2.1 7 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS.
- Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase & Hive Integration, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework.
- Knowledge on MS SQL Server
- Knowledge in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Strong knowledge in Software Development Life Cycle (SDLC)
- Strong knowledge on creating and monitoring Hadoop cluster on VM, Hortonworks Data Platform 2.1 7 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS.
- Strong understanding in Agile and Waterfall SDLC methodologies.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Good Knowledge in creating reports using Tableau
- Experienced in installing, configuring and administrating Hadoop Clusters.
TECHNICAL SKILLS
Hadoop/Big Data: Hadoop, MapR, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala
No SQL Databases: Hbase,Cassandra, mongoDB
Languages: Java, J2EE, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools and IDE: Eclipse, Intellij
WORK EXPERIENCE
Confidential, New York, New York
Hadoop Developer/Admin
Responsibilities:
- Worked on analyzing Hadoop cluster with live 65 nodes and different big data analytic tools including Pig, HBase database and Sqoop.
- Developed various UNIX shell wrappers to run various Abinitio jobs.
- Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
- Built abinitio graphs to read data from Hadoop ecosystem and load it to target SQL Server.
- Managing and reviewing Hadoop log files and debugging failed jobs.
- Implemented Kerberos Security Authentication protocol for production cluster.
- Implemented test scripts to support test driven development and continuous integration.
- Working on different size of clusters on Cloudera and Hotonworks distribution.
- Worked with Infrastructure teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Backed up data on regular basis to a remote cluster using distcp.
- Worked on Multi Clustered environment and setting up Hotonworks Hadoop echo -System.
- Cluster coordination services through Zookeeper.
- Loaded the dataset into Hive for ETL Operation.
- Creating strategic partnerships within Big Data space with companies like Hotonworks, Cloudera, and Datastax.
- Automated all the jobs or pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Installed Hotonworks Hadoop installation using Ambari.
- Worked with the BI team by partitioning and querying the data in Hive.
- Involved in analyzing large amounts of data sets to determine optimal way to aggregate and report on it.
Environment Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Java, Puppet, Apache Yarn, Pig, Spark, Tableau
Confidential, New York, Maryland
Hadoop Developer/Admin
Responsibilities:
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant
- solutions to reduce the impact and documenting the same and preventing future issues
- Adding/installation of new components and removal of them through Cloudera Manager
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades
- Worked with cloud services like AZURE and involved in ETL, Data Integration and Migration.
- Wrote Lambda functions in python for AZURE Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes,
- Troubleshooting, Manage and review data backups, Manage & review log files
- Responsible to Design & Develop the Business components using Java.
- Creation of Java classes and interfaces to implement the system.
- Designed, built, and deployed a multitude applications utilizing almost all the AZURE stack, focusing on high-availability, fault tolerance, and auto-scaling
- Designed and developed automation test scripts using Python.
- Azure Cloud Infrastructure design and implementation utilizing ARM templates.
- Orchestrated hundreds of Sqoop scripts, python scripts, Hive queries using Oozie workflows and sub- workflows
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks
- Partitioned and queried the data in Hive for further analysis by the BI team
- Extending the functionality of Hive and Pig with custom UDF s and UDAF's on Java
- Involved in extracting the data from various sources into Hadoop HDFS for processing
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop
- Creating and truncating HBase tables in hue and taking backup of submitter ID(s)
- Used Amazon EMR for map reduction jobs and test locally using Jenkins.
- Creating and managing Azure Web-Apps and providing the access permission to Azure AD users
- Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX
- Involved in loading data from LINUX file system to HDFS
- Experience in configuring the Storm in loading the data from MYSQL to HBASE using jms
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau
- Experience in managing and reviewing Hadoop log files
Environment Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python
Confidential, New York, New York
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapR.
- Developed data pipeline using Flume, Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Used Pig to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into Hbase.
- Participated in development/implementation of Cloudera Hadoop environment.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
- Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data.
- Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase.
- Responsible for creating Hive tables, loading the structured data resulted from MapR jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Involved in running MapR jobs for processing millions of records.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Developed Hive queries and Pig scripts to analyze large datasets.
- Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.
- Involved in generating the Adhoc reports using Pig and Hive queries.
- Used Hive to analyze data ingested into Hbase by using Hive-Hbase integration and compute various metrics for reporting on the dashboard.
- Provide operational support for Hadoop and/or MySQL databases
- Developed job flows in Oozie to automate the workflow for pig and hive jobs.
- Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.
- Involved in Installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster. using Cloudera Manager.
- Capable to handle Hadoop cluster installations in various environments such as Unix, Linux and Windows, able to implement and execute Pig Latin scripts in Grunt Shell.
- Strong capability to utilize Unix shell programming methods, able to diagnose and resolve complex
- Experienced with file manipulation, advanced research to resolve various problems and correct integrity for critical Big Data issues with NoSQL Hadoop HDFS Database.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Translated high level requirements into ETL process.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Implemented NameNode backup using NFS. This was done for High availability.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in Installing the Oozie workflow engine in order to run multiple Hive and Pig jobs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hadoop, MapR, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, CDH3, MongoDB, Cassandra, HBase, Java (jdk 1.6), Eclipse, Oracle and Unix/Linux.
Confidential, Westlake, Texas
SDET
Responsibilities:
- Involved in almost all the phases of SDLC.
- Executed test cases manually and logged defects using Clear Quest
- Automated the functionality and interface testing of the application using Quick Test Professional (QTP)
- Design, Develop and maintain automation framework (Hybrid Framework).
- Analyze the requirements and prepare automation scripts scenario
- Develop test data for Regression testing using QTP
- Wrote Test cases on IBM rational Manual Tester
- Conducted Cross Browser testing on Different Platform
- Client Application Testing, Web based Application Performance, Stress, Volume and Load testing of the system using Load Runner 9.5.
- Analyzed performance of the application program itself under various test loads of many simultaneous Vusers.
- Analyzed the impact on server performance CPU usage, server memory usage for the applications of varied numbers of multiple, simultaneous users.
- Inserted Transactions and Rendezvous points into Web Vusers
- Created Vuser Scripts using VuGen and used Controller to generate and executed Load Runner Scenarios
- Complete involvement in Requirement Analysis and documentation on Requirement Specification.
- Prepared use-case diagrams, class diagrams and sequence diagrams as part of requirement specification documentation.
- Involved in design of the core implementation logic using MVC architecture.
- Used Apache Maven to build and configure the application.
- Developed JAX-WS web services to provide services to the other systems.
- Developed JAX-WS client to utilize few of the services provided by the other systems.
- Involved in developing EJB 3.0 Stateless Session beans for business tier to expose business to services component as well as web tier.
- Implemented Hibernate at DAO layer by configuring hibernate configuration file for different databases.
- Developed business services to utilize Hibernate service classes that connect to the database and perform the required action.
- Developed JavaScript validations to validate form fields.
- Performed unit testing for the developed code using JUnit.
- Developed design documents for the code developed.
- Used SVN repository for version control of the developed code.
Environment: SQL, Oracle 10g, Apache Tomcat, HP Load Runner, IBM Rational Robot, Clear quest, Java, J2EE, HTML, DHTML, XML, JavaScript, Eclipse, WebLogic, PL/SQL and Oracle.