We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • 9 years of IT professional experience with emphasis on Design, Development, Implementation, Testing, Maintenance and Deployment of Software Applications using Java, J2EE and Big Data technologies.
  • 4+ years of experience in Big Data technologies and Hadoop ecosystem components like Spark, HDFS, MapReduce, Pig, Hive, YARN, Sqoop, Flume, Kafka and NoSQL systems like HBase, Cassandra.
  • Expertise in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Hive, Impala etc.
  • Experience in creating complex data ingestion pipelines, data transformations, data management and data governance at Enterprise level.
  • Hands - on experience in installing, configuring and monitoring HDFS clusters (on premise & cloud AWS).
  • In depth understanding of Map Reduce and AWS cloud concepts and its critical role in data analysis of huge and complex datasets.
  • Created custom Database Encryption Connectors that could be plugged into Sqoop to be able to encrypt the data while importing to HDFS/Hive.
  • Experience in fine-tuning and troubleshooting Spark Applications, Hive queries.
  • Extensive hands on experience in writing complex MapReduce jobs, Pig Scripts and HiveQL scripts.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to fully implement and leverage new Hadoop features.
  • Experience in using Amazon Cloud services like S3, EMR etc.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
  • Experience in Apache Flume and Kafka for collection, aggregation and moving huge chunks of data from various sources such as web server, telnet sources etc.
  • Worked on Java HBase API for ingesting processed data to HBase tables.
  • Extensive knowledge and experience in using Apache Storm, Spark Streaming, Apache Spark, Kafka and Flume in creating data streaming solutions.
  • Experienced in working with Machine learning libraries (spark-MLlib) and implementing ML algorithms for clustering, regression filtering and dimensional reduction.
  • Extensive understanding of Partitions and Bucketing concepts in Hive.
  • Designed and managed External tables with right partition strategies to optimize performance in Hive.
  • Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using Combiners, Partitioners and design patterns.
  • Created few Hive UDF's to perform some complex business specific transformations and rules.
  • Expert knowledge over J2EE Design Patterns like MVC Architecture, Session Facade, Front Controller and Data Access Objects for building J2EE Applications.
  • Experienced in using Agile methodologies including extreme programming, Scrum Process and Test-Driven Development (TDD).
  • Intensive work experience in developing enterprise solutions using Java, J2EE, Struts, Servlets, Hibernate, JavaBeans, JDBC, JSP, JSF, JSTL, MVC, Spring, Custom Tag Libraries, JNDI, AJAX, SQL, JavaScript, AngularJS and XML.
  • Conversant with web application Servers like Tomcat, WebSphere, WebLogic and Jboss servers.
  • Experience in development of logging standards and mechanism based on Log4j.
  • Experience in writing ANT and Maven scripts to build and deploy Java applications.

TECHNICAL SKILLS:

Big Data Ecosystem:  Hadoop, HDFS, Spark, Hive, Kafka, Hue, MapReduce, YARN, Pig, Sqoop, HBase, Cassandra, Zookeeper, Oozie, Storm, Flume, Talend, Cloudera Manager, Amazon Web Services AWS, Hortonworks and Cloudera clusters

Programming Languages:  C, Java, Unix Shell Scripting, AngularJS, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL

Scripting Languages:  Shell Scripting, Java script

Java Tools & Web Technologies:  J2EE, JSF, EJB, HTML, XHTML, AngularJS, Servlets, JSP, JSTL, CSS, CSS, XML, XSL, XSLT, Ajax

Open Source:  Hibernate, Spring AOP, Spring IOC, Spring MVC, Spring Web Flow

Web Services:  SOAP, RESTful, JAX-WS

Web Application Servers:  WebSphere Application Server, Web Logic, Web Sphere, Apache Tomcat.

Design:  UML, Rational Rose, E-R Modelling, Microsoft Visio

Database:  MongoDB, Cassandra, HBase, Oracle 11g/10g/9i, MySQL, Microsoft SQL Server, Squirrel SQL, Teradata SQL, DB2, RDBMS

Frameworks:  MVC, Spring, Struts, JPA, ORM (Hibernate), JDBC

Visualization:  Tableau and MS Excel

Development Tools:  Eclipse, NetBeans, ANT, Maven and SBT

Version Control System:  GITHUB, CVS, SVN

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Sr. Hadoop Developer

Responsibilities:

  • Worked on importing and exporting data from DB2 and HP Vertica into HIVE using Sqoop for visualization, analysis and to generate reports.
  • Loaded CSV files containing user event information into Hive External tables on daily basis.
  • Created Spark applications and used Data frames and SparkSQL API primarily for performing event enrichment and performing lookups with other enterprise data sources.
  • Build a real time streaming pipeline by using Kafka integration with Storm and Spark Streaming.
  • Perform ETL on different formats of data like JSON, CSV files and converted them to parquet while loading to final tables. Ran ad-hoc querying using Hive and Impala.
  • Extensively perform complex data transformations in Spark using Scala language.
  • Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
  • Involved in converting Hive/SQL queries into Spark transformations using Scala.
  • Connect Tableau and Squirrel SQL clients to SparkSQL (Spark thrift server) via data source and run the queries.
  • Worked with Machine learning libraries (Spark-MLlib) and done clustering, regression, filtering and dimensional reduction using implemented ML algorithms.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala (Prototype).
  • Used Impala as the primary analytical tool for allowing visualization servers to connect and perform reporting on top Hadoop directly.
  • Used Machine learning libraries as Spark MLlib and implemented ML algorithms for clustering, regression filtering and dimensional reduction process with data scientists.
  • Wrote spark jobs for Data clustering and data processing using Spark-MLlib and cluster algorithms as per functional requirements.
  • Experience in large-scale streaming data analytics using Storm.
  • Working on Oozie workflow engine to run multiple HiveQL jobs and on schedulers.
  • Done various compressions and file formats like Parquet, Snappy, Gzip, Bzip2, Avro, Text.
  • Implemented test scripts to support test driven development and continuous integration.
  • Used Zookeeper to provide coordination services to the cluster.
  • Used Impala and tableau to create various reporting dashboards.

Environment: HDFS, Hadoop 2.x, Pig, Hive, Sqoop, Flume, Spark, MapReduce, Scala, Oozie, YARN, Tableau, Squirrel SQL, Spark-SQL, Spark-MLlib, Impala, HBase, Cassandra, UNIX Shell Scripting, Storm, Zookeeper, Kafka, Agile Methodology, Cloudera 5.9, SBT.

Confidential, Denver, CO

Sr. Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
  • Worked on importing and exporting data from Teradata, mysql into Hive using Sqoop.
  • Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Developed Hive scripts to perform data transformation and etl processes.
  • Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from assorted data sources to make it suitable for ingestion into Hive schema for analysis.
  • Designed and developed MapReduce jobs to process data coming in different file formats like XML, CSV, JSON.
  • Transferring data between MySQL and Hadoop Distributed File System (HDFS) using Sqoop with connectors.
  • Creating and populating Hive tables and writing Hive queries for data analysis to meet the business requirements.
  • Running MapReduce jobs to access HBase data from application using Java Client APIs.
  • Developed Pig Latin scripts and HQL queries for the analysis of Structured, Semi-Structured and Unstructured data.
  • Automating the extraction, processing and analysis of data jobs using Oozie.
  • Used SVN for version control.
  • Created working POC’s using Spark 1.1.0 streaming for real time stream processing of continuous stream of large data sets.
  • Involved building and managing NoSQL Database like Hbase or Cassandra.
  • Integrating Cassandra with Storm for real time user attributes look up.
  • Worked in Spark to read the data from Hive and write it to Cassandra using Java. 
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Developed Spark scripts by using Scala and Python Shell commands as per the requirement.
  • Installed and configured Hadoop Scala, HDFS (AWS cloud) and responsible for Developing multiple Spark jobs in Scala, R scripts & CLI for data managing and preprocessing events.
  • Implemented Machine learning libraries, spark-MLlib algorithms for clustering, filtering, regression and dimensional reduction.
  • Worked on data warehouse product Amazon Redshift which is a part of the AWS (Amazon Web Services)
  • Developed Spark and SparkSQL scripts to migrate data from RDBMS into AWS-RedShift.
  • Developed ETL Scripts for Data acquisition and Transformation using Talend.

Environment: Hadoop 2.x, Hive, HQL, HDFS, MapReduce, Spark 1.1.0, Scala, Sqoop, Storm, Kafka, Flume, Oozie, HBase, AWS-RedShift, Python, Java, Maven, Eclipse, Putty, AWS-EC2, Talend, Hortonworks.

Confidential, Minnetonka, MN

Hadoop Developer

Responsibilities:

  • Analyzing the functional specifications provided by the client and developing detailed solution design document with the Architect and the team.
  • Used Hadoop architecture with Map Reduce functionality and its ecosystem to solve the customer requirements using Cloudera Distribution for Hadoop (CDH).
  • Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
  • Created Data Lake which serves as a base layer to store and do analytics on data flowing from multiple sources into Hadoop Platform.
  • Involved in creating Hive tables, loading the data and writing hive queries which will run internally in map reduce. 
  • Optimized Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Scripted complex HiveQL queries on Hive tables for analytical functions by implementing Hive Generic UDFs.
  • Wrote customized User Defined Function’s (UDF) to process further in Java and Perl to ease the processing in Pig.
  • Involved in building applications using Maven and integrated them by using continuous Integration servers like Jenkins to build jobs.
  • Worked on Import and export data from Legacy Databases RDBMS into HDFS and Hive using Sqoop.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Designing NoSQL schemas in HBase and Cassandra.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Hadoop 1.x, HDFS, MapReduce, Hive, Pig, HBase, Tez, Sqoop, Oozie, Maven, Shell Scripting, Teradata, CDH3, Cloudera Manager.

Confidential, Raleigh, NC

Java Application Developer

Responsibilities:

  • Involved in distinct phases of Software Development Life Cycle as requirement gathering, data modeling, analysis, architecture design & development for the project.
  • Worked in Agile Environment and responsibilities include analysis of these various applications, co-ordination with client, meetings with business users.
  • Used JSF Framework and Spring Framework in the application, which is based on MVC design pattern.
  • Spring Framework is used for Dependency injection, security and integration with the Hibernate framework.
  • Written the Java Script, HTML, DHTML, CSS, Servlets, and JSP for designing GUI of the application.
  • Designed and developed user interface using JSP, JSTL, JSF and Custom Tag Libraries and AJAX JQuery, Angular JS to speed the application.
  • Extensively used Hibernate in the database to access and update information in data access layer.
  • Used Angular JS, JQuery to manipulate the DOM objects for Ajax calls and for User interface look and feel.
  • Extensive knowledge of database practices, structures, principles and theories.
  • Used PL/SQL to manage data and create tables and performed unit testing using Junit framework
  • Implemented RESTFUL web services using Angular JS.
  • Used MAVEN to build the application.
  • Monitored the error logs using Log4J and fixed the problems.
  • Developed the different components of application using STS 3.4 and used SVN for version control.

Environment: Java/J2EE, Tomcat 7.0, Spring 3.1, Hibernate 3.2, JavaScript, HTML, DHTML, CSS, Servlets, JSP, JSTL, XML, SOAP, Web Services, PL/SQL, JDBC, JQuery, Ajax, Maven, MVC, Log4J, Unix, Angular JS, JSF 1.2, Log4J, STS 3.4, SVN.

We'd love your feedback!