Hadoop Developer Resume
AtlantA
SUMMARY:
- Over 7+ years of software experience with 4 years of experience in design and implementation of big data applications using Hadoop ecosystem Hive, Pig, Oozie, Sqoop, Flume and HBase Database.
- Hands on exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
- Experienced in job workflow scheduling and monitoring tool Oozie.
- Experience in collecting and aggregating a large amount of Log Data using Apache Flume and storing data in HDFS for further analysis.
- Experience in NoSQL Column-Oriented Databases like HBase and its Integration with Hadoop cluster.
- Extensively worked in writing, fine tuning and profiling Mapreduce jobs for optimized performance.
- Extensive experience in implementing data analytical algorithms using Map reduces design patterns.
- Experience in working on multiple platforms like Linux and Windows.
- Good knowledge on the programming language Python, Scala.
- Experience in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Experience in converting business process into RDD transformations using Apache Spark and Scala.
- Experience in Writing Producers/consumers and creating messaging centric applications using Apache Kafka.
- Have knowledge on Apache Storm to integrate with Apache Kafka for stream processing.
- Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS).
- Knowledge on Splunk UI to work at production support to perform log analysis.
- Expertise and Knowledge in using job scheduling and monitoring tools like Azkaban, Oozie and ZooKeeper.
- Expertise in writing Shell-Scripts, Cron Automation and Regular Expressions.
- Experience with Source Code Management (SCM) tool like GIT/ GITHUB.
- Good knowledge of Software Development Life Cycle (SDLC), Waterfall methodologies and Agile Models.
- Extensive experience with Build tools like MAVEN and Apache ANT.
- Good command in working with Tracking Tools JIRA.
- Have excellent communication skills, time management skills, good organization skills, problem solving, self-motivated, hardworking, ability to work cooperatively or independently in a team and eager to learn.
TECHNICAL SKILLS:
Operating Systems: Windows, Linux, Unix
Hadoop Stack: HDFS, YARN, MapReduce, Sqoop, Flume, Hive, Pig, Kafka, Oozie, ZooKeeper, Spark, Scala, Spark-SQL, HBase, Splunk.
IDE: Eclipse
Version Control Tools: GIT, SVN
Build Tools: Maven, Jenkins
Database: HBase, SQL, Oracle, Sybase
Scripting Languages: Shell
Application Servers: Tomcat, IBM Web Sphere
Education: Master s in computer science at University of Central Missouri.
Master of Computer Application from Jawaharlal Nehru Technological University, India
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta
Hadoop Developer
Responsibilities:
- Design and develop ETL data pipeline using Spark App to fetch data from Legacy system and third party API, social media sites.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Worked on Apache Spark for stream processing, graph analytics using Amazon EMR clusters.
- Worked on spark applications and launching clusters with spark in EMR console.
- Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
- Importing the data into Spark from Kafka Consumer group using Spark Streaming APIs.
- Worked with SQOOP import and export functionalities to handle large data set transfer between MySql database and HDFS.
- Implemented test scripts to support test driven development and continuous integration.
- Perform data analytics and load data to Amazon s3 and Redshift.
- Write and build Azkaban workflow jobs to automate the process.
- Develop spark Sql tables & queries to perform Adhoc data analytics for analyst team.
- Deploy components using Maven Build system and Docker images
- For automating deployment process developed Shell Scripts.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Monitoring spark clusters.
- Experience in Agile methodology.
- Worked with file formats TEXT, AVRO, PARQUET and SEQUENCE files.
- Developed ETL Frame Works.
Environment: Languages/Technologies: Scala, Spark, Spark Sql, Hive, Elastic Search, Spring boot. Special Software: Azkaban, Eclipse, GIT Repository, Amazon S3, Amazon Redshift, Amazon AWS Ec2/EMR, Amazon EMR Spark cluster, Hadoop Framework, Sqoop, Maven, UNIX Shell ScriptingClient: BCBS, Chicago
Confidential
Hadoop Developer
Responsibilities:
- Developed Sqoop Framework to ingest Historical data and incremental data from Oracle, DB2 and SQL Server etc.
- Worked on flume, to read the messages from JMS Queue to load in HDFS.
- Involved in developing Audit framework, includes record count validation, schema validation, file naming pattern etc. to validate the data before ingesting into Data Lake.
- Created Zena jobs and scheduled thru Zena scheduler.
- Developed Shell scripts to read files from edge node to ingest into HDFS partitions based on the file naming pattern.
- Created Data model for the sources R2, FPDB and CPSUPP as part of the ingestion process.
- Created Pig transformations to load the data from incoming layer to silver tables and then to smith tables.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Experience in tuning Hive Queries and Pig scripts to improve performance.
- As part of Audit frame work, used HBase for inserting, updating the audit entries.
- Integrated Hive and HBase databases in order make it possible to perform Online Analytics Processing (OLAP) and Online Transaction Processing (OLTP) on the same data without redundancy.
- Developed custom UDF’s to generate unique key for the use in pig transformations.
- Identified control characters in the data and developed scripts to remove them.
- Converted existing Pig Scripts to Spark, as part of improving performance.
- Involved in POC, to read real time messages from producers to HDFS using Flume.
- Created Branches in GitHub, pushed the code and deployed to production thru Jenkins for the production release.
- Used JIRA to track bugs.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Horton Works, Pig, Hive, Spark, Spark-SQL, Kafka, Sqoop, Flume, HBase, Zena, Eclipse, Java, Shell Script, XML, JIRA, GitHub, Jenkins.
Confidential, Minneapolis
Hadoop Developer
Responsibilities:
- Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- Exported the analyzed data to the relational databases (MySQL, Oracle, Teradata) using Sqoop from HDFS.
- Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest data into HDFS for analysis.
- Used FLUME to dump the application server logs into HDFS.
- Responsible for writing MapReduce jobs to handle files in multiple formats (JSON, Text, XML etc..).
- Extensively worked on creating combiners, Partitioning, Distributed cache to improve the performance of MapReducejobs.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked with file formats TEXT, AVRO, PARQUET and SEQUENCE files.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Devised schemes to collect and store large data in HDFS and also worked on compressing the data using various formats to achieve optimal storage capacity.
- Designed a data warehouse using Hive.
- Created partitioned tables in Hive for best performance and faster querying.
- Good experience in Hive partitioning, bucketing and peform different types of joins on Hive tables and implementing Hive serdes like REGEX, JSON and Avro.
- Good Knowledge of Apache Spark and Scala.
- Cluster coordination services through Zoo Keeper.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Experience in working with NoSQL database HBase in getting real time data analytics.
- Implemented test scripts to support test driven development and continuous integration.
Environment:: Hadoop, HDFS, Pig, Sqoop, HBase, Hive, Zoo Keeper, NoSQL databases, Oozie, Shell Scripting, Java (JDK 1.6), Hadoop Distribution Cloudera, MapReduce, PL/SQL.
Confidential
Hadoop Developer
Responsibilities:
- Designed and implemented Java engine and API to perform direct calls from front-end JavaScript (ExtJS) to server-side Java methods (ExtDirect).
- Developed interfaces and their implementation classes to communicate with the mid-tier (services) using JMS. Technically, it is a 3-tier client server application, where GUI tier interacts with Java middle-tier custom library and queries an Oracle 10g database using Hibernate.
- Implemented Restful services for Account Summary and workable list etc for Reports Decouple support.
- Developed and consumed SOAP Web services using JBoss ESB framework
- Designed and developed Java batch programs in Spring Batch.
- Involved in write database schema Through Hibernate.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing in Hortonworks.
- Experienced in managing and reviewing Hadoop log files
- Developed Sqoop scripts to import export data from relational source teradata and handled incremental loading on the customer, transaction data by date.
- Tuning of MapReduce configurations to optimize the run time of jobs.
- For automating the cluster installation developed Shell Scripts.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in loading data from UNIX file system to HDFS.
- Created java operators to process data using DAG streams and load data to HDFS.
- Developed custom Input Formats to implement custom record readers for different datasets.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Automated all jobs for pulling data from FTP server to load data into Hive tables using Oozie workflow.
- Experienced in using Java Rest API to perform CURD operations on HBase data.
- Create & deploy web services using REST framework.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Developed Unit test cases using Junit and MRUnit testing frameworks.
Environment: : Hadoop, Map Reduce, HDFS, Hbase, Hive, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Shell Scripts, Java JDK 1.6, Eclipse.
Confidential
Hadoop Developer
Responsibilities:
- Tuning of MapReduce configurations to optimize the run time of jobs.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Developed Simple to complex Map-Reduce Jobs using Hive and Pig for automating the cluster installation developed Shell Scripts.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in loading data from UNIX file system to HDFS.
- Created Sub-Queries for filtering and faster execution of data. Created multiple Join tables and fetched the required data.
- Responsible for managing data from multiple sources.
- Developed custom Input Formats to implement custom record readers for different datasets.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Automated all jobs for pulling data from FTP server to load data into Hive tables using Oozie workflow.
- Experienced in using Java Rest API to perform CURD operations on HBase data.
- Create & deploy web services using REST framework.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
Environment: Hadoop, Map Reduce, HDFS, Hbase, Hive, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Shell Scripts, Java JDK 1.6, Eclipse.
Confidential
Java Developer
Responsibilities:
- Involved in gathering system requirements for the application and worked with the business team to review the requirements, and went through the Software Requirement Specification document and Architecture document
- Involved in intense User Interface (UI) operations and client side validations using AJAX toolkit.
- Responsible for extracting the data by Screen Scraping and also responsible for consuming the web services using Apache CXF.
- Used Axis and SOAP to expose company applications as a Web Service to outside clients.
- Log package is used for the debugging.
- Used Clear Case for version control.
- Ensuring adherence to delivery schedules and quality process on projects.
- Used Web Services for creating rate summary and used WSDL and SOAP messages for getting insurance plans from different module and used XML parsers for data retrieval.
- Developed business components and integrated those using Spring features such as Dependency Injection, Auto wiring components such as DAO layers and service proxy layers.
- Used Spring AOP to implement Distributed declarative transaction throughout the application.
- Wrote Hibernate configuration XML files to manage data persistence.
- Used TOAD to generate SQL queries for the applications, and to see the reports from log tables.
- Involved in migration of Data from Excel, Flat file, Oracle, XML files to SQL Server by using BCP and DTS utility.
Environment:: Java/J2EE, MVC Arch with CICS interaction, HTML,Axis, SOAP, Servlets, Web services, Restful web Services, Sybase, Spring, DB2, RAD, Apache CXF, Rational Clear case, WCF, AJAX, Toad, Bcp, Dts.
Confidential
Java Developer
Responsibilities:
- Coded front end components using HTML, JavaScript and jQuery, Back End components using Java, spring, Hibernate, Services Oriented components using Restful and SOAP based web services, and Rules based components using JBoss Drools.
- Developed presentation layer using JSP, HTML and CSS and JQuery.
- Developed JSP custom tags for front end.
- Written Java script code for Input Validation.
- Worked on extracting the data from websites by screen scrapping and delivering the data to client in the format they are looking for.
- Extensive usage of modern front-end template frameworks for JavaScript including Bootstrap, JQuery, AngularJS.
- Used Apache CXF open source tool to generate java stubs form WSDL.
- Developed and consumed SOAP Web services using JBoss ESB framework
- Developed the Web Services Client using SOAP, WSDL description to verify the credit history of the new customer to provide a connection.
- Developed RESTful Web services within ESB framework and used Content based Routing to route to ESB's
- Designed and developed Java batch programs in Spring Batch.
- Test Driven development is done by maintaining the Junit and FlexUnit test cases throughout the application.
- Developed stand-alone Java batch applications with spring and Hibernate.
- Involved in write database schema Through Hibernate.
- Designed and developed DAO layer with Hibernate standards, to access data from IBM DB2.
- Developed the UI panels using JSF, XHTML, CSS, DOJO and JQuery.
Environment:: Java 6 - JDK 1.6, JEE, Spring 3.1 framework, Spring Model View Controller (MVC), Java Server Pages (JSP) 2.0, Servlets 3.0, JDBC4.0, AJAX, Web services, Rest full, JSON, Java Beans, JQuery, JavaScript, Oracle 10g, JUnit, HTML Unit, XSLT, HTML/DHTML.
