We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Lisle, IL

PROFESSIONAL SUMMARY:

  • Over 6+ years of professional IT experience and over 4 Years of Big Data Ecosystem experience in ingestion, storage, querying, processing and analyzing of big data.
  • Strong development skills in Hadoop ecosystem components like HDFS, MapReduce, YARN, Zookeeper, Hbase, Hive, Pig, Sqoop, Flume, Spark, Storm, Kafka, Impala, Oozie.
  • In depth knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, YARN, Resource Manager, Node Manager and MapReduce concepts
  • Proficient in design and development of MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Expertise in writing Hive and Pig scripts and UDFs to perform data analysis on large data sets.
  • Good understanding of NoSQL databases including HBase, Cassandra, MongoDB.
  • Hands on experience in using Sqoop to import data from HDFS to RDBMS and vice - versa.
  • Managed and Scheduled jobs on Hadoop cluster using Apache Oozie.
  • Strong Knowledge on Apache Spark with Scala Environment.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Good hands on experience in creating the RDD's, Data frames for the required input data and performed the data transformations using Spark Scala.
  • Good knowledge on real time data streaming solutions using Apache Spark Streaming, Kafka and Flume.
  • Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy, bzip2.
  • Developed Simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Skilled in developing applications in Python language for multiple platforms.
  • Hands on experience in application development using Java, Linux Shell Scripting.
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSP, Hibernate, JDBC, EJB.
  • Extensive experience working in Oracle, DB2, SQL Server, PL/SQL and My SQL database.
  • Good knowledge in working with cloud integration with Amazon's Simple Storage Service (S3), Amazon Elastic MapReduce (EMR), Amazon Cloud Compute (EC2) and Microsoft Azure HDInsight.
  • Followed Test driven development of Agile, Water Fall Methodology to produce high quality software.
  • Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and Git.
  • Good interpersonal skills, committed, result oriented, hard working with a quest and deal to learn new technologies.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, MapReduce, YARN, Zookeeper, Hive, Pig, Sqoop, Flume, Spark, Storm, Impala, Oozie, Kafka.

NoSQL Databases: HBase, Cassandra, MongoDB

Distributions: Cloudera, Hortonworks, Amazon Web Services, Azure.

Languages: C, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Java Script, Shell Scripting

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB, RESTful

Web Technologies: HTML5, CSS3, JavaScript, Json, JQuery, Ajax, Angular JS.

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Databases: Microsoft SQL Server, MySQL, Oracle, DB2

Operating Systems: UNIX, Windows, LINUX

Build Tools: Jenkins, Maven, ANT

Business Intelligence Tools: Tableau, Splunk, Qlik View

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall

Version Control Tools: Git, SVN

PROFESSIONAL EXPERIENCE:

Senior Hadoop Developer

Confidential, Lisle, IL

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Used Spark - Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 80 nodes.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Involved in file movements between HDFS and AWS S3.
  • Extensively worked with S3 bucket in AWS.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Hadoop YARN, Spark 1.6, Spark Streaming, Spark SQL, Scala, Python, Kafka, Hive, Sqoop 1.4.6, Elastic Search, Impala, Cassandra, Tableau, Talend, Oozie, Jenkins, Cloudera, AWS-S3, Oracle 12c, Linux.

Hadoop Developer

Confidential, Columbia, SC

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on POC's with Apache Spark using Scala to implement spark in project.
  • Consumed the data from Kafka using Apache Spark.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table - Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using java.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Experienced with performing CRUD operations in HBase.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Actively involved in code review and bug fixing for improving the performance.
  • Good experience in handling data manipulation using python Scripts.
  • Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
  • Created Linux shell Scripts to automate the daily ingestion of IVR data
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Processed the raw data using Hive jobs and scheduling them in Crontab.
  • Created HBase tables to store various data formats of incoming data from different portfolios.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs
  • Good Experience in importing data from Teradata into Hadoop.

Environment: Hadoop, HDFS, Pig, Apache Hive0.12, Sqoop, Kafka, Apache Spark, Storm, Java, Shell Scripting, Hbase0.96, Python, Agile, Zoo Keeper, Maven, Hortonworks2.0, Teradata, MySQL.

Hadoop Developer

Confidential, Tampa, FL

Responsibilities:

  • Worked on the proof - of-concept for Apache Hadoop 1.20.2 framework initiation.
  • Installed and configured Hadoop clusters and eco-system.
  • Developed automated scripts to install Hadoop clusters.
  • Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validations.
  • Performed load and retrieve unstructured data (CLOB, BLOB etc.)
  • Developed Hive jobs to transfer 8 years of bulk data from DB2, MS SQL Server to HDFS layer.
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts.
  • Job automation framework to support & operationalize data loads.
  • Automated the DDL creation process in hive by mapping the DB2 data types.
  • Monitored Hadoop cluster job performance and capacity planning.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Had experience in Hadoop framework, HDFS, MapReduce processing implementation.
  • Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters.
  • Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • Designed Business classes and used Design Patterns like Data Access Object, MVC etc.
  • Used AVRO, Parquet file formats for serialization of data.
  • Good experience with ETL data flow using informatica power center.
  • Developed several test cases using MR Unit for testing Map Reduce Applications.
  • Responsible for troubleshooting and resolving the performance issues of Hadoop cluster.
  • Used Bzip2 compression technique to compress the files before loading it to Hive.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile devices and pushed to HDFS.
  • Experience in using HBase as backend database for the application development.
  • Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing.
  • Prepare daily and weekly project status report and share it with the client.

Environment: Hadoop, MapReduce, Flume, Sqoop, Hive, Pig, WebServices, Linux, Core Java, Informatica, HBase, Avro, JIRA, Cloudera, MR Unit, MS-SQL Server, UNIX, DB2.

Java Developer

Confidential

Responsibilities:

  • Implemented the application using Agile methodology. Involved in daily scrum and sprint planning meetings.
  • Actively involved in analysis, detail design, development, bug fixing and enhancement.
  • Driving the technical design of the application by collecting requirements from the Functional Unit in the design phase of SDLC.
  • Developed Micro services using RESTful services to provide all the CRUD capabilities.
  • Creating requirement documents and design the requirement using UML diagrams, Class diagrams, Use Case diagrams for new enhancements.
  • Developed the Application Module using several design patterns like Singleton, DAO, DTO, and MVC.
  • Involved in writing JSPs, Java Script and Servlets to generate dynamic web pages and web content.
  • Used JBoss application server deployment of applications.
  • Developed communication among SOA services.
  • Involved in creation of both service and client code for JAX - WS and used SOAPUI to generate proxy code from the WSDL to consume the remote service.
  • Designed the user interface of the application using HTML5, CSS3, JavaScript, Angular JS, JQuery and AJAX.
  • Designed Node.js application components through Express.
  • Implemented AJAX functionality to speed up web application.
  • Created Single Page Application with loading multiple views using route services and adding more user experience to make it more dynamic by using Angular JS framework.
  • Implemented with Angular JS using its advantage including two-way data binding and templates.
  • Designed user interface with Java SWING of Java, keeping the business standards in mind.
  • Developed Static and Dynamic pages using JSP and Servlets.
  • Used Hibernate persistence strategy to interact with database.
  • Worked with Session Factory, ORM mapping, Transactions and HQL in Hibernate framework.
  • Used Web services for sending and getting data from different applications using Restful.
  • Wrote client side and server side validations using Java Scripts Validations.
  • Writing stored procedures, complex SQL queries for backend operations with the database.
  • Devised logging mechanism using Log4j.
  • GitHub has been used as a Version Controlling System.
  • Creating tracking sheet for tasks and timely report generation for tasks progress.

Environment: Java, J2EE, Java Swing, HTML, Java Script, Angular JS, Node.JS, JDBC, JSP, Servlet, UML, Hibernate, XML, JBoss, SDLC methodologies, Log4j, GitHub, Restful, JAX-RS, JAX-WS, Eclipse IDE.

We'd love your feedback!