Sr. Hadoop Developer Resume
SUMMARY
- Result - driven IT Professional with referable 8+ years of experience in in Development, Testing and Implementation of Business Intelligence and Data warehousing solutions.
- Excellent experience with Apache Hadoop components like HDFS, MapReduce, HiveQL and Pig.
- Experience in installing Cloudera Hadoop CDH4 on an Amazon EC2 Cluster.
- Experience in Installing, Configuring and administrating the Hadoop Cluster of Major Hadoop Distributions.
- Hands on experience in MapReduce jobs using HiveQL and PigLatin.
- Expert in installing, Configuring and using echo system components like Hadoop, MapReduce, HDFS, Oozie, HiveQL, Sqoop, Pig and Flume.
- Expert in implementing Database projects which includes Analysis, Design, Development, Testing and Implementation of end-to-end IT solution offerings.
- Extensive knowledge in RDBMS, developing database applications which involved creating Stored Procedures, Views, Triggers, user defined data types and functions.
- Good knowledge in various phases of software development life cycle (SDLC) including System Analysis and Design, Software Development, Testing, Implementation, and Documentation.
- Experience in dealing with log files to extract data and to copy into HDFS using flume.
- Proficient in writing MapReduce Programs and using Apache Hadoop Map Reduce API for analyzing the structured and unstructured data. Handling RSS Feeds in MapReduce.
- Experience in Streaming tools Spark, Spark Structure, Kafka Streaming.
- Experience in developing Sqoop jobs to import data from RDBMS sources into HDFS as well as export dat a from HDFS into Relational tables.
- Excellent logical, analytical, communication and inter- personnel skills with exceptional ability to learn new concepts / fast learner with complex systems.
TECHNICAL SKILLS
Big Data Technologies: Hadoop, Scala 2.11.8, HDFS, Hive, MapR 2.7.0, Pig, Sqoop, Flume, Oozie, HBase, Spark 2.2.0, Python 2.7, Kafka
Programming Languages: Java (5, 6, 7), Python, C, C++
Databases/ RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle 9i/10g/11g, DB2, Azure SQL Server 2017
Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL
ETL Tools: Informatica
Operating Systems: Linux CentOS 6.9, Windows XP/7/8/10, UNIX
Software Life Cycles: SDLC, Waterfall and Agile models
Office Tools: MS-Office, MS-Project and Risk Analysis tools
Utilities/Tools: Eclipse, Tomcat, NetBeans, IntelliJ IDEA CE, JUnit, SQL, Automation, MR-Unit, Airflow 1.10.2 Scheduler, Jenkins 2.107.3
Cloud Platforms: Amazon EC2
Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate
NoSQL Database: Cassandra, HBase, Dynamo DB
PROFESSIONAL EXPERIENCE
Confidential
Sr. Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed solutions using Hadoop MapR.
- Worked with CBB (Customer Back Bone) team for retrieving all ids required for implementing customer request for action with ID Lookup.
- Developed spark scripts which takes sequence data frame as input to the CCPA graphx traversal jar and traverse linked ids for ID Lookup.
- Implemented Scala using spark to load JSon data from REST API into a data frame and utilized Data frames for transformations and Spark SQL API for faster processing of data.
- Optimized spark jobs configurations based on real-time or batch and tested configurations for most optimal workloads of around 800GB and 1TB joins with and without buckets.
- Developed automated SQL server reports for ID Lookup through email by using shell script.
- Created Airflow DAG’s (Directed Acyclic Graph) using Python script for scheduling the tasks automatically and also to send the email alerts when the task is failed.
- Implemented unit test cases using shell script.
- Worked on Traversal Graph Analysis on hive tables with thresholds based on the mean and variance of the linkages.
- Created Looper Jobs using Jenkins and created operational playbooks.
- Created support plan for manual and automation process.
- Involved in End-to-End testing in staging and production and sent daily reports for requests received from Service Now and processed by CBB.
- Documented operational playbooks for ID Lookup and service now acceptance support for team.
Environment: Hadoop YARN, Spark 2.2.0, Spark Core, Spark SQL, Scala 2.11.8, Python 2.7, MapR 2.7.0, Hive 0.13.1, Airflow Scheduler 1.10.2, Linux CentOS 6.9, Azure SQL server 2017, Service Now, Jira, Jenkins 2.107.3, IntelliJ IDEA CE
Confidential - North Chicago, IL
Hadoop Developer
Responsibilities:
- Obtained the requirement specifications from the SME’s, Business Analysts in the BR, and SR meetings for corporate work place project.
- Interacted with the Business users to build the sample report layouts.
- Wrote the HLD’s along with the RTM’s tracing back to the corresponding BR’s and SR’s and reviewed them with the Business.
- Implemented an Enterprise level Transfer Pricing System to ensure tax efficient supply chains and achieve entity profit targets.
- IOP implementation involved understanding the Business requirements and solution design, translating the design into model construction, data loading using ETL logic, data validation and creating several custom reports as per the end user requirements.
- Installed and configured Apache Hadoop and Hive/Pig Ecosystems.
- Installed and Configured Cloudera Hadoop CDH4 via Cloudera Manager in a pseudo distributed mode and cluster mode.
- Developed the Python APIs which represent the memory subsystem.
- Developed Hive UDFs and Pig UDFs using Python in Microsoft HDInsight environment.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with HiveQL.
- Development of Python APIs to dump the array structures in the Processor at the failure point for debugging.
- Developed Map reduce program to extract and transform the data sets and resultant dataset were loaded to Cassandra and vice versa using Kafka 2.0.x.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Developed Spark Streaming custom receiver to process data from Rabbit MQ into Cassandra and Aerospike tables.
- Worked on Xml Stub’s integrating them with the Excel VB code and the backend DB.
- Created Map Reduce Jobs using Hive/Pig Queries.
- Used NOSQL database services like Dynamo DB.
- Responsible for Data Ingestion like Flume and Kafka.
- Involved in installing, configuring and managing Hadoop Ecosystem components like Spark, Hive, Pig, Sqoop, Kafka and Flume.
- Usage of Spark streaming and Spark SQL API to process the files.
- Used Apache Spark with Python to develop and execute Big Data Analytics.
- Worked on importing and exporting data into HDFS Sqoop and Flume and Kafka.
- Designed Outbound Packages to dump IOP Processed data into the Out tables for the Data Warehouse and the Cognos BI team.
- Worked on DB2 to store, analyze and retrieve the Data.
- Involved in Unit testing, System Integration testing and UAT post development.
- Provided End User training and configured reports in IOP.
Environment: Oracle IOP, Apache Hadoop, HDFS, Sqoop, Flume, Kafka, Cassandra, Cloudera Hadoop CDH4, HiveQL, Pig Latin, Spark, Dynamo DB
Confidential - Great Neck, NY
Hadoop Developer
Responsibilities:
- Worked as a Senior Developer for the project.
- Used Enterprise Java Beans as a middleware in developing a three-tier distributed application.
- Developed Session Beans and Entity beans to business and data process.
- Implemented Web Services with REST.
- Developed user interface using HTML, CSS, JSPs and AJAX.
- Worked on Client side validation using JavaScript and JQuery.
- Performed client side validation with JavaScript and applied server side validation as well to the web pages.
- Used JIRA for BUG Tracking of Web application.
- Wrote Spring Core and Spring MVC files to associate DAO with Business Layer.
- Worked with HTML, DHTML, CSS, and JAVASCRIPT in UI pages.
- Wrote Web Services using SOAP for sending and getting data from the external interface.
- Extensively worked with JUnit framework to write JUnit test cases to perform unit testing of the application.
- Developed Spark Streaming custom receiver to process data from Rabbit MQ into Cassandra and Aerospike tables.
- Developed real time data ingestion from Kafka to Elastic search by using Kafka Elastic Search input and output plugins.
- Implemented JDBC modules in java beans to access the database.
- Designed the tables for the back-end Oracle database.
- Application hosted under Web Logic and developed utilizing Eclipse IDE.
- Used XSL/XSLT for transforming and displaying reports. Developed Schemas for XML.
- Involved in writing the ANT scripts to build and deploy the application.
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
- Implemented field level validations with AngularJS, JavaScript and JQuery.
- Preparation of Unit Test scenarios and Unit Test Cases.
- Used Dynamo DB for running applications.
- Worked on branding the site with CSS.
- Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
- Worked with Spark on parallel computing to enhance knowledge about RDD in DataStax Cassandra.
- Worked with Scala to determine the flexibility of Scala on Spark and Cassandra to the management.
- Worked on Code review and Unit Testing the code.
- Used DB2 with the support of Object-Oriented features and Non-Relational structures with XML.
- Involved in unit testing using Junit.
- Implemented Log4J to trace logs and to track information.
- Involved in project discussions with clients and analyzed complex project requirements as well as prepared design documents.
Environment: Hive, Pig, HBase, Zookeeper, Sqoop, Kafka, Cassandra, Cloudera, Java, JDBC, JNDI, Struts, Maven, Subversion, JUnit, SQL language, DB2, Spring, Hibernate, Junit, Oracle, XML, Putty and Eclipse, Dynamo DB
Confidential
Hadoop Developer
Responsibilities:
- Involved in Automation of clickstream data collection and store into HDFS using Flume.
- Involved in creating Data Lake by extracting customer's data from various data sources into HDFS.
- Used Sqoop to load data from Oracle Database into Hive.
- Developed MapReduce programs to cleanse the data in HDFS obtained from multiple data sources.
- Implemented various Pig UDF's for converting unstructured data into structured data.
- Developed Pig Latin scripts for data processing.
- Involved in writing optimized Pig Script, along with developing and testing Pig Latin Scripts.
- Involved in creating Hive tables as per requirement defined with appropriate static and dynamic partitions.
- Used Hive to analyze the data in HDFS to identify issues and behavioral patterns.
- Involved in production Hadoop cluster set up, administration, maintenance, monitoring and support.
- Logical implementation and interaction with HBase.
- Assisted in creation of large HBase tables using large set of data from various portfolios.
- Cluster coordination services through Zookeeper.
- Efficiently put and fetched data to/from HBase by writing MapReduce job.
- Developed MapReduce jobs to automate transfer of data from/to HBase.
- Assisted with the addition of Hadoop processing to the IT infrastructure.
- Used flume to collect the entire web log from the online ad-servers and push into HDFS.
- Implemented custom business logic by writing UDF's in Java and used various UDF's from Piggybank and other sources.
- Implemented MapReduce job and execute the MapReduce job to process the log data from the ad-servers.
- Load and transform large sets of structured, semi structured and unstructured data.
- Backend Java developer for Data Management Platform (DMP) and building RESTful APIs to build and let other groups build dashboards.
Environment: Hadoop, Pig, Sqoop, Oozie, MapReduce, HDFS, Hive, Java, Eclipse, HBase, Flume, Oracle 10g, UNIX Shell Scripting, GitHub, Maven
