Hadoop Developer Resume
Kansas City, MO
SUMMARY
- Result oriented professional with overall 8 years of IT experience which includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java and Big data Hadoop technologies
- Around 3 Years of experience in Big Data Hadoop Ecosystems with ingestion, storage, querying, processing and analysis of big data.
- Good competent in processing of large sets of Structured, Semi - structured and Unstructured data and reinforced with System Application Architecture
- Closely worked with Data Architecture including Data ingestion Pipeline Design, Hadoop Information Architecture, Data Modelling and Mining, Machine Learning and Advanced Data Processing.
- Proactive participation in migration, support and co-originating with external stakeholders.
- Good exposure to Apache Hadoop Map Reduce programming, Hive, PIG scripting, Horton and HDFS.
- Extensively used Apache Kafka to load the log data from multiple sources directly into HDFS.
- Expertise in writing custom UDFs and UDAFs in Pig & Hive Core Functionality.
- Came across optimizing Extracting Transforming Loading (ETL) workflows.
- Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups
- Experience in debugging and solving problems which occurs in Hadoop eco-system.
- Extensively worked with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Experience in using Cloudera Manager for installation and management of single-node and multi-node Hadoop cluster (CDH3, CDH4 & CDH5).
- Configured security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Worked with extraction and insertion of the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Have experience with Spark processing Framework such as Spark and Spark SQL.
- Experienced in designing, built, and deploying a multitude applications utilizing almost all of the AWS stack (Including EC2, R53, S3, RDS, DynamoDB, SQS, IAM, and EMR),focussing on high-availability, fault tolerance, and auto-scaling.
- Expertise skills in Python, Core Java and J2EE technologies like Java Multithreading, Object Oriented Design Patterns, Exception Handling, Servlets, Garbage Collection, JSP,HTML, Struts, Hibernate, Spring MVC, Enterprise Java Beans, JDBC, RMI, JNDI and AJAX.
TECHNICAL SKILLS
Big Data Technologies: Map Reduce, Cassandra, Pig, Hive, HDFS, Spark, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, Storm, CDH 5.3, CDH 5.4
Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia
Methodologies: Software Development Life Cycle (SDLC),Agile, Scrum, Waterfall
Languages: Java, Shell Scripting,Python, Scala, Linux, RHEL
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2, Teradata
NoSQL Databases: Hbase, Cassandra, MongoDB, DynamoDB
Web Servers: AWS, Redshift, Tomcat, EC2, S3, RDS, ELB
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL, AWS
ETL Tools: Informatica, Pentaho, SSRS, SSIS, BO, Crystal reports, Cognos.
Testing Frameworks: Junit, MRUnit
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP
Security: Kerberos
PROFESSIONAL EXPERIENCE
Hadoop Developer
Confidential, Kansas City, MO
Responsibilities:
- Worked on a live 60 nodes Hadoop cluster running clouderaCDH5.4
- Performed both major and minor upgrades to the existing CDH cluster.
- Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication factor of 3)
- Used SQOOP to move the structured data from Teradata, Oracle and SQL.
- Worked and created Sqoop (version 1.4.3) jobs with incremental load to populate Hive External tables.
- Extensive experience in writing Pig (version 0.11) scripts to transform raw data from several data sources into forming baseline data.
- Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis
- Involved in capacity planning, configuration of Cassandra Cluster on DATASTAX.
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Real time streaming the data using Spark with Kafka.
- Involved in using components of Spark including Spark SQL, Spark Stream and MLib.
- Used Elastic search for Real time searches and analytics capabilities.
- Worked in converting HiveQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Involved in reading multiple data formats on HDFS using PySpark
- Analyzed the data by performing Hive queries (HiveQL) and running pig scripts (Pig Latin) to study customer behavior.
- Worked integrating Hadoop with Informatica loading data to HDFS and Hive.
- Written RDDs in Scala to execute in Spark to see the performance benefit against Map Reduce.
- Worked on data warehouse product Amazon Redshift which is a part of the AWS (Amazon Web Services).
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Leveraged Flume to stream data from Spool Directory source to HDFS Sink using AVRO protocol.
- Experience in using Sqoop to import the data on to Cassandra tables from different RDBMS.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Prepared, arranged and tested SPLUNK search strings and operational strings.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Involved in Cluster coordination services through Zookeeper and adding new nodes to an existing cluster.
- Developed Enterprise Lucene/SOLR based solutions to include custom type/object modelling and implementation into the Lucene/SOLR analysis (Tokenizers/Filters) pipeline.
- Created custom SOLR Query components to enable optimum search matching.
- Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare Vm's as required in the environment.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Developed test cases for Unit testing using Junit and MRUnit and performed integration and system testing
- Developed Simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.
- Developed Complex Talend jobs to migrate the data from flat files to database
- Administrating Tableau Server backing up the reports and providing privileges to users.
- Extracted and updated the data into Monod using MongoDB import and export command line utility interface.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
- Configured a High Availability in your cluster and Configured security with Kerberos.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Splunk, Spark, Storm, AWS EC2, Redshift, Kafka, SOLR, LINUX, RHEL, Cloudera, Agile Big Data, Scala, Python, SQL, HiveQL, NoSQL, Cassandra, Elastic Search, Tableau, Talend, Teradata, HBase.
Hadoop Developer
Confidential, Kansas City, MO
Responsibilities:
- Used Sqoop to transfer data between RDBMS and HDFS.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Implemented complex map reduce programs to perform map side joins using distributed cache.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Worked with the team to increase cluster from 32 nodes to 46 nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
- Designed and implemented custom writable, custom input formats, custom partitions and custom comparators in Mapreduce.
- Thoroughly tested Mapreduce programs using MRUnit and Junit testing frameworks.
- Worked on Sql workbench to load and aggregate the data from S3 to Redshift.
- Developed data pipelines to process the data from the source systems directly into Redshift database.
- Responsible for troubleshooting issues in the execution of Mapreduce jobs by inspecting and reviewing log files
- Worked on Elastic search and created various scripts.
- Performed Splunk administration tasks such as installing, configuring, monitoring and tuning.
- Experienced in analysing the SQL scripts and designed the solution to implement using PySpark .
- Set up Kerberos locally on 5 node POC cluster using Ambari and evaluated the performance of cluster, did impact analysis of Kerberos enablement.
- Applied Apache Storm with Kafka Spout and Hbase/Redis for real time analytics, designing Spouts, Bolts and Topology.
- Effectively used Oozie to develop automatic workflows of Sqoop, Mapreduce and Hive jobs.
- Installed and Configured Informatica Power Center on Client and Server Machines and Configure Informatica Server and Register Servers.
- Familiar in implementing Amazon Elastic MapReduce (Amazon EMR) for big data processing,
- Used the JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents
- Configured and integrated Lucene/SOLR for ALDOT's documents full-text search
- Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on developing ETL processes (Data Stage &Talend open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and databases such as HBase, and MongoDB.
- Worked on regular expression related text-processing using the in-memory computing capabilities of Spark using Scala.
- Involved in NoSQL database design, integration and implementation.
- Loaded data into NoSQL database Hbase.
- Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
- Explored Spark MLIB library to do POC on recommendation engines.
Environment: Hadoop, CDH4, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, Java, spark, SOLR, Kafka, Flume, Storm, Linux, Talend, Scala, Maven, Elastic Search, Python, Pyspark, Splunk Oracle 11g/10g, SVN.
Hadoop Developer
Confidential, Harrisburg, PA
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Managing and scheduling Jobs on a Hadoop cluster.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Wrote MapReduce jobs using Java API.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Installed and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase and Sqoop.
- Installed and configured Pig and also written Pig Latin scripts.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Develop Hive queries for the analysts.
- Wrote Hive queries for data analysis to meet the business requirements.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Took part in monitoring, troubleshooting and managing Hadoop log files.
Environment: Apache Hadoop, Java (jdk1.6), DataStax, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, UNIX, Sqoop, Hive, Oozie.
Java\J2ee Developer
Confidential
Responsibilities:
- Analysis on business requirements and participated in user story estimation in agile methodology.
- Prepared design documents.
- Created Spring Restful Web services consuming and producing JSON / XML, XSLT, JAX-RS APIs and XSL
- Developed web pages, backend flow and integrating the backend data to web pages using the Spring2.5 MVC, Hibernate 3.0, JQuery and JSP/Servlets.
- Used SONAR tool for code review.
- Written JUnit test cases.
- Written REST based web services and consumed in web application.
- Performed activities - debugging, coding, testing and defect fixing.
- Conducted code review in team.
- Lead the development team of 5 members and shared technical oversight
Environment: Agile, JDK 1.6, Spring2.5, JSP, Servlets, Spring MVC, Restful, XML, JavaScript, SAX, JAXB, Ajax, Hibernate, Log4j, WebLogic Application Server and Oracle11g
Java Developer
Confidential
Responsibilities:
- Involved in gathering requirements, Analysis, Design, Development and testing of the entire Application.
- Involved in all phases of SDLC (Software Development Life Cycle).
- Created UML diagrams like class diagrams and activity diagrams using the Rational Rose.
- Participated in the design and development of application using JSP, HTML, CSS and JavaScript.
- Used Eclipse as IDE tool to develop the application and JIRA for bug and issue tracking.
- Designed and Developed the presentation layer using AJAX for RUI (Rich User Interface).
- JSON is used in conjunction with JavaScript for making HTTP requests.
- JQuery is used for validation.
- Developed the presentation tier of the application using Struts framework and MVC design pattern.
- Configured the Hibernate ORM framework as persistence layer for the backend by using hibernate.confg.xml
- Designed and developed DAO's for accessing the POJO's and updating the DB tables using the POJO's, Java Collections, and Synchronization etc.
- Used Hibernate object relation mappings (ORM) for the database operations on MySQL.
- Developed and modified the stored procedures, the DAO (Data Access Objects) and VO (value Object) classes for separating the Data Access logic and business logic.
- Extensively participated in application integration. Spring is used to integrate Struts and Hibernate. Implemented interceptors for Spring and Hibernate.
- Transactions were implemented using declarative transactions in Spring with transaction managers capable of supporting Hibernate.
- Configuration issues in the various frameworks used were identified and resolved to extract an acceptable level of performance in terms of efficiency, response and robustness.
- Consumed Web Services as a gateway for the payment through the third-party.
- Developed Web Services using SOA, SOAP, WSDL, UDDI and JAX-WS, JAX-RPC programming models.
- Used Ant as build tool for building and deploying it into Weblogic Server. Ant scripts are used for automating build process.
- Developed and execute unit tests and test suites for product components using JUnit Testing Used.
Environment: Core Java, J2EE1.6.x, JDK, JSP, Struts 2.x, Tiles, JMS, Spring 3.x, Hibernate 3.0, MySQL, Eclipse, WebSphere Application Server, JBOSS, JSON, AJAX, JQuery, Web Services(SOAP, WSDL), Ant, JavaScript, CSS, Log4J, Junit, HTML, PL/SQL, CVS and DB2.
Java Developer
Confidential
Responsibilities:
- Developed Intranet Web Application using JEE architecture, using JSP to design the user interfaces
- Developed the application based on MVC architecture using Spring Framework, designed Action Classes, Form Beans
- Developed several web pages using JSP, HTML/CSS
- Used JavaScript / JQuery to perform checking and validations at Client's side
- Evaluated and worked with JDBC for Persistence.
- Implemented test cases for Unit testing of modules using JUnit
- Involved in deploying web and enterprise applications in Tomcat 6.
- Used SL4J-Log4j for logging purposes. Monitored the error logs using Log4j and fixed the problems detected
Environment: Java 1.5, J2EE, JSP, Servlets, HTML, JavaScript, JQuery, JDBC, Spring, XML, JDBC, JUnit, WSAD, WebSphere Application Server, ANT, Rational Rose, Oracle9i, Windows XP.