We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

El Paso, TX

PROFESSIONAL SUMMARY:

  • Having 8+ years of overall IT experience with 4 Years of comprehensive experience as an Apache Hadoop Developer.
  • Working experience on designing and implementing complete end - to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
  • Good knowledge ofHadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce concepts.
  • Experience in providing support to data analyst in running Pig and Hive queries.
  • Experience in writing shell scripts to dump the shared data from MySQL servers to HDFS.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Expertise in writingHadoopJobs for analyzing structured and unstructured data using HDFS, Hive, HBase, Pig, Spark, Kafka, Scala, Oozie and ETL.
  • Experience in working with different kind of MapReduce programs using Hadoopfor working with Big Data analysis.
  • Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
  • Experience in importing/exporting data using
  • Sqoop into HDFS from Relational Database Systems and vice-versa.
  • Experience in performance tuning the Hadoop clusterby gathering and analyzing the existing infrastructure.
  • Experience in automating the Hadoop Installation, configuration and maintaining the cluster by using the tools like Puppet.
  • Experience in working with flume to load the log data from multiple sources directly into HDFS.
  • Strong debugging and problem-solving skills with excellent understanding of system development methodologies, techniques and tools.
  • Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) in different application domain involving different technologies varying from object-oriented technology to Internet programming on Windows NT, Linux and UNIX/ Solaris platforms and RUP methodologies.
  • Familiar with RDBMS concepts and worked on Oracle 8i/9i, SQL Server 7.0., DB2 8.x/7.x
  • Involved in writing shell scripts, Ant scripts for Unix OS for application deployments to production region.
  • Having very good POC and Development experience on Apache Flume, Kafka, Spark, Storm, and Scala.
  • Ability to articulate different complex Statistical Concepts and identify KEY INSIGHTS from DATA.
  • Familiar with Data Mining, machine learning and modeling.
  • Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.
  • Good working knowledge on Hadoop hue ecosystems.
  • Good knowledge in evaluating big data analytics libraries and use of Spark-SQL for data exploratory.

TECHNICAL SKILLS:

Bigdata Technologies: Hadoop, MapReduce, HDFS, Hive, Pig, Zookeeper, Sqoop, Oozie, Flume, IMPALA, HBASE, Kafka, Storm.

Big Data Frameworks: HDFS, YARN, Spark.

Hadoop Distributions: Cloudera (CDH3, CDH4, CDH5), Horton works, Amazon EMR, EC2.

Programming Languages: Java, shell scripting, Scala.

Databases: RDBMS, MySQL, Oracle, Microsoft SQL Server, Teradata, DB2, PL/SQL, CASSANDRA, MongoDB.

IDE and Tools: Eclipse, NetBeans, Tableau.

Operating System: Windows, Linux/Unix.

Frameworks: Spring, Hibernate, JSF, EJB, JMS.

Scripting Languages: JSP & Servlets, JavaScript, XML, HTML, Python.

Application Servers: Apache Tomcat, Web Sphere, Web logic, JBoss.

Methodologies: Agile, SDLC, Waterfall.

Web Services: Restful, SOAP.

ETL Tools: Talend, Informatica.

Others: Solr, elastic search.

PROFESSIONAL EXPERIENCE:

Confidential, El Paso, TX

Hadoop Developer

Responsibilities:

  • Worked on developing a new application in Spark called PSTL.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Wrote Unit and Integration Tests for code base of PSTL.
  • Real time streaming the data using Spark with Kafka.
  • Worked on HQL for sorting and processing of the data regarding the associated hospitals.
  • Experience on data processing in HQL of the patients’ information regarding his bills etc.
  • Worked on HQL for analyzing structured data in the Metastore.
  • Worked on Spark SQL for analyzing the queries given as input to PSTL.
  • Experienced with batch processing of data sources using Apache Spark.
  • Created Hive tables to store various data formats of PII data coming from different portfolios.
  • Cluster co-ordination services through Zookeeper.
  • Worked on Spark Listener with PSTL to capture the metrics.
  • Experience with Prometheus and Grafana.
  • Worked on Vertica database for creating tables and storing data into Columns.
  • Worked on creating Key Tabs in Kerberos for authentication in UAT clusters.
  • Experience in debugging within the PSTL application.
  • Worked on updating the JDBC drivers for the Vertica DB in terms of SharedSQL.

Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, MapR, HDFS, Hive, Pig, Apache Kafka, Sqoop, Java (JDK SE 6, 7), Scala, Shell scripting, Linux, MySQL Oracle Enterprise DB, Jenkins, Eclipse, Oracle, Git, Oozie, MySQL, Soap, NIFI, Cassandra and Agile Methodologies.

Confidential, Ridgefield Park, NJ

Hadoop Developer

Responsibilities:

  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
  • Co-ordinated with the team that analyzed large data sets to provide strategic direction to the company.
  • Worked on Elastic search.
  • Worked on Amazon Web Services (EC2, ELB, VPC, S3, Cloud Front and IAM).
  • Co-ordinated migrating and building-out AWS infrastructure.
  • Development of software using core java with integration of Apache Storm, Apache Kafka.
  • Worked on analytics dashboards which process large amount of data.
  • Worked on analyzingHadoopcluster using different big data analytic tools including Flume, Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, Spark and Kafka.
  • Developed Spark code using scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Experience migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
  • Hands-on experience with production analyzing Hadoop applications viz. development, monitoring, debugging and performance tuning.
  • Translated functional and technical requirements into detail programs running on Hadoop MapReduce and Spark.
  • Has been part of high throughput messaging processing system development using Kafka & Spark.
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary.
  • Written Hive UDFs in java and python.
  • Involved in the process of HBase data modeling and building efficient data structures.
  • Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem.
  • Managing CDN on Amazon CloudFront (Origin Path: Server / S3) to improve site performance.
  • Integrated various state-of-the-art Big Data technologies into the overall Experiences in designing, reviewing and optimizing data transformation processes using Hadoop and Apache-Storm.
  • Usage of different Hadoop Component like Hive, Pig, Spark.
  • Responsible for architecting Hadoop clusters.
  • Written shell scripts and Python scripts for automation of job.
  • Assist with the addition of Hadoop processing to the IT infrastructure
  • Perform data analysis using Hive and Pig.

Environment: Hadoop, HDFS, Spark, MapReduce, Pig, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Python, Java, SQL Scripting and Linux Shell Scripting, Cloudera, Cloudera Manager, EC2, EMR, S3, AWS

Confidential, Uniondale, NY

Hadoop Developer

Responsibilities:

  • Experienced indefining jobflows.
  • Got pleasant experience with NOSQL database.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Experienced in managing andreviewingHadooplog files.
  • Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
  • Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from various sources.
  • Involved in review of functional and non-functional requirements.
  • Facilitated knowledge transfer sessions.
  • Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Develop MapReduce jobs for the users. Maintain, update and schedule the periodic jobs which range from updates on periodic MapReduce jobs to creating ad-hoc jobs for the business users.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally inmap reduce way.
  • Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
  • This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
  • Designed and implemented Mapreduce-based large-scale parallel relation-learning system
  • Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.

Environment: Java 6, Eclipse, Oracle 10g, Sub Version, Hadoop, Hive, HBase, Linux, MapReduce, HDFS, Hive, Java (JDK 1.6),HadoopDistribution of HortonWorks, Cloudera, MapReduce, DataStax, IBM DataStage 8.1, Oracle 11g / 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting.

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

  • Performed load and retrieve unstructured data (CLOB, BLOB etc.)
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts.
  • Job automation framework to support & operationalize data loads.
  • Automated the DDL creation process in hive by mapping the DB2 data types.
  • Monitored Hadoop cluster job performance and capacity planning.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked on the proof-of-concept for Apache Hadoop 1.20.2 framework initiation.
  • Installed and configured Hadoop clusters and eco-system.
  • Developed automated scripts to install Hadoop clusters.
  • Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validations.
  • Had experience in Hadoop framework, HDFS, MapReduce processing implementation.
  • Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters.
  • Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • Designed Business classes and used Design Patterns like Data Access Object, MVC etc.
  • Used AVRO, Parquet file formats for serialization of data.
  • Good experience with ETL data flow using Informatica power center.
  • Developed several test cases using MR Unit for testing MapReduce Applications.
  • Responsible for troubleshooting and resolving the performance issues of Hadoop cluster.
  • Used Bzip2 compression technique to compress the files before loading it to Hive.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile devices and pushed to HDFS.

Environment: Hadoop, MapReduce, Flume, Sqoop, Hive, Pig, Webservices, Linux, Core Java, Informatica, HBase, Avro, JIRA, Cloudera, MR Unit, MS-SQL Server, UNIX, DB2.

Confidential

Java Developer

Responsibilities:

  • Developed the Application Module using several design patterns like Singleton, DAO, DTO, and MVC.
  • Involved in writing JSPs, Java Script and Servlets to generate dynamic web pages and web content.
  • Used JBoss application server deployment of applications.
  • Implemented the application using agile methodology. Involved in daily scrum and sprint planning meetings.
  • Actively involved in analysis, detail design, development, bug fixing and enhancement.
  • Driving the technical design of the application by collecting requirements from the Functional Unit in the design phase of SDLC.
  • Developed Microservices using RESTful services to provide all the CRUD capabilities.
  • Creating requirement documents and design the requirement using UML diagrams, Class diagrams, Use Case diagrams for new enhancements.
  • Developed communication among SOA services.
  • Involved in creation of both service and client code for JAX-WS and used SOAPUI to generate proxy code from the WSDL to consume the remote service.
  • Designed the user interface of the application using HTML5, CSS3, JavaScript, Angular JS, JQuery and AJAX.
  • Designed Node.js application components through Express.
  • Implemented AJAX functionality to speed up web application.

Environment: Java, J2EE, Java Swing, HTML, Java Script, Angular JS, Node.JS, JDBC, JSP, Servlet, UML, Hibernate, XML, JBoss, SDLC methodologies, Log4j, GitHub, Restful, JAX-RS, JAX-WS, Eclipse IDE.

Confidential

Java Developer

Responsibilities:

  • Used JavaScript for validating the Front-End Web pages
  • Written SQL code blocks using cursors for shifting records from various tables based on checks
  • Written procedures and triggers for validating the consistency of metadata.
  • Used AJAX to make the Restful web service calls.
  • Involved in designing Class and Sequence diagrams with UML and Data flow diagrams.
  • Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Models using Microsoft Visio.
  • Worked on server-side implementation using Struts MVC framework.
  • Developed JSP’s with STRUTS custom tags and implemented JavaScript validation of data
  • Developed programs for accessing the database using JDBC thin driver to execute queries, prepared statements, Stored Procedures and to manipulate the data in the database.
  • Developed Message Driven Beans for asynchronous processing of alerts.
  • Used IBM Clear case for source code control and JUNIT for unit testing.
  • Log4J used as logging framework.
  • Application versions were managed by SVN.
  • Followed coding and documentation standards

Environment: Java, JSP, Struts MVC, Oracle 10G, SQL, PL/SQL, JBOSS, JUnit, SQL Developer, Ajax, MAVEN, Eclipse, SVN, Log4j, REST.

We'd love your feedback!