We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Merrimack, NH

PROFESSIONAL SUMMARY:

  • 8+ years of Professional Experience in IT industry, which includes 4 years of experience with Big Data and Hadoop Eco Systems and strong programming experience using Java, Scala, Python, PHP and SQL.
  • Solid hands - on experience with Hadoop ecosystem components like Spark, Hive, Impala, MapReduce, Pig, HBase, Sqoop, NiFi, Kafka, Yarn and Oozie.
  • Strong fundamental understanding of Distributed Systems Architecture and parallel processing frameworks.
  • Strong experience designing and implementing end to end data pipelines running on terabytes of data.
  • Used Spark and Storm extensively to perform data transformations, data validations and data aggregations.
  • Hands on Experience on Data Ingestion tools like Apache Sqoop for importing and exporting data to Relational data base systems (RDBMS) and vice-versa.
  • Experience in Apache NIFI which is a Hadoop technology and Integrating Apache NIFI and Apache Kafka
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
  • Good knowledge and development experience with using MapReduce framework.
  • Proficient in creating Hive DDL's, writing Hive custom UDF’s.
  • Experience designing Oozie workflows to schedule and manage data flow.
  • Good experience is designing and implementing end to end Data Security and Governance within Hadoop Platform using Kerberos.
  • Experience in working with NoSQL database like HBase, Cassandra and Mongo DB.
  • Experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
  • Experience working with various Hadoop Distributions like Cloudera, Amazon AWS and Hortonworks distributions.
  • Created Talend Mappings to populate the data into dimensions and fact tables.
  • Experience in Apache Spark Core, Spark SQL, Spark Streaming, Spark ML.
  • Experience in using different columnar file formats like Avro, ORC and Parquet formats.
  • Experience in working with the integration of Hadoop with Amazon S3, Redshift.
  • Good experience in Object Oriented Programming, using Java & J2EE (Servlets, JSP, Java Beans, EJB, JDBC, RMI, XML, JMS, Web Services, AJAX).
  • Proficiency in frameworks like Struts, Spring, Hibernate.
  • Expertise in working with RDBMS databases like Oracle and DB2.
  • Experience in Database design, Database analysis, Entity relationships, Programming SQL.
  • Strong expertise in creating Shell-Scripts, Regular Expressions and Cron Job Automation.
  • Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
  • Worked with geographically distributed and culturally diverse team, including roles that involve interaction with clients and team members.

TECHNICAL SKILLS:

Big Data Eco System: Hadoop, HDFS, MapReduce, Hive, Pig, Impala, HBase, Sqoop, NoSQL (HBase, Cassandra), Spark, Spark SQL, Spark Streaming, Zookeeper, Oozie, NiFi, Kafka, Flume, Hue, Cloudera Manager, Ambari, Amazon AWS, Hortonworks clusters

Java/J2EE & Web Technologies: J2EE, JMS, JSF, Servlets, HTML, CSS, XML, XHTML, AJAX, Angular JS, JSP, JSTL

Languages: C, C++, Java, Shell Scripting, PL/SQL, Python, Pig Latin, Scala

Scripting Languages: JavaScript and UNIX Shell Scripting, Python

Operating system: Windows, MacOS, Linux and Unix

Design: UML, Rational Rose, Microsoft Visio, E-R Modelling

DBMS / RDBMS: Oracle 11g/10g/9i, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, RDBMS, MongoDB, Cassandra, HBase

IDE and Build Tools: Eclipse, NetBeans, Microsoft Visual Studio, Ant, Jenkins, Docker, Maven, JIRA, Confluence

Version Control: SVN, CVS, GITHUB

Security: Kerberos

Web Services: SOAP, RESTful, JAX-WS

Web Servers: WebLogic, WebSphere, Apache Tomcat, Jetty

PROFESSIONAL EXPERIENCE:

Confidential, Merrimack, NH

Sr. Hadoop Developer

Responsibilities:

  • Developed new platform using Hadoop for performing user behavioral analytics.
  • Ingested customer profile information from data warehouse into HDFS using Sqoop
  • Developed custom connectors for pulling marketing and campaign data feeds from FTP servers into HDFS.
  • Performed Data Ingestion from multiple internal clients exposed as Rest calls using Apache Kafka.
  • Created Kafka producers for streaming real time click stream events from adobe Rest services into our topics.
  • Developed Spark streaming applications for consuming the data from Kafka topics.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Analyzed the data using Spark Data Frames and series of Hive Scripts to produce summarized results from Hadoop to downstream systems.
  • Used spark SQL to load the metrics from the summarized results to hive tables in parquet format.
  • Implemented Python Scripts for Auto Deployments in AWS.
  • Worked with Spark Data Frames, Spark SQL and Spark MLlib extensively.
  • Worked with a team to improve the performance and optimization of the existing algorithms in Hadoop using Spark, Spark -SQL, Data Frame.
  • Implemented Apache Storm Spouts, bolts to process data by creating topologies.
  • Implemented business logic in Hive and written UDF’s to process the data for analysis.
  • Implemented security on Hadoop Cluster using with Kerberos by working with operations team to move from a non-secured cluster to secured cluster.
  • Created Hive external tables on top of the HDFS data.
  • Used Cloudera Manager to manage and monitor Hadoop Stack.
  • Used Oozie to define a workflow to coordinate the execution of Spark, Hive and Sqoop jobs.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Developed traits and case classes in Scala.
  • Setup Jenkins on AWS EC2 servers and configured the notification server to Jenkin server for any changes to the repository.
  • Used Impala to perform interactive querying.
  • Developed interactive Dashboards using Tableau connecting to Impala.
  • Worked with Data Science team in developing Spark MLlib applications to develop various predictive models.
  • Used Jira as an issue tracking tool for design and documentation of run time problems and procedures.
  • Expertise on interacting with the project team to organize timelines, responsibilities and deliverables to provide all aspects of technical support.
  • Coordinated effectively with offshore team and managed project deliverable on time.

Environment: Hadoop 2.x, Spark, Scala, Hive, Sqoop, Oozie, Kafka, Cloudera Manager, Storm, ZooKeeper, HBase, Impala, YARN, Cassandra, JIRA, Kerberos, Shell Scripting, SBT, GITHUB, Maven.

Confidential, Rockville, MD

Hadoop/Spark Developer

Responsibilities:

  • Involved in requirement analysis, design, coding and implementation phases of the project.
  • Used Sqoop to load structured data from relational databases into HDFS.
  • Loaded transactional data from Teradata using Sqoop and created Hive Tables.
  • Worked on automation of delta feeds from Teradata using Sqoop and from FTP Servers to Hive.
  • Set up Apache NiFi to transfer structured and streaming data into HDFS.
  • Experience working with NiFi in multi-tenant authorization.
  • Worked with different compression codecs like GZIP, SNAPPY and BZIP2 in MapReduce, Pig and Hive for better performance.
  • Developed Spark codes using Spark-SQL for faster processing of data
  • Performed Transformations like De-normalizing, cleansing of data sets, Date Transformations, parsing some complex columns.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Handled Avro, JSON and Apache Log data in Hive using custom Hive SerDes.
  • Worked on batch processing and scheduled workflows using Oozie.
  • Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
  • Implemented Spark batch applications using Scala for performing various kinds of cleansing, de-normalization and aggregations.
  • Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run Map-reduce and Spark.
  • Used Hive-QL to create partitioned RC, ORC tables, used compression techniques to optimize data process and faster retrieval.
  • Implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.

Environment: HDFS, Hadoop, NiFi, Kafka, Spark, Pig, Hive, HBase, Sqoop, Teradata, Flume, Map Reduce, Oozie, Java 6/7, Oracle 10g, YARN, UNIX Shell Scripting, Maven, Agile Methodology, JIRA, Linux.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Developed complex MapReduce jobs in Java to perform data extraction, aggregation and transformation
  • Load the data into HDFS from different Data sources like Oracle, DB2 using Sqoop and load into Hive tables.
  • Analyzed big data sets by running Hive queries and Pig scripts.
  • Integrated the hive warehouse with HBase for information sharing among teams.
  • Developed the Sqoop scripts for the interaction between Pig and MySQL Database.
  • Worked on Static and Dynamic partitioning and Bucketing in Hive.
  • Scripted complex Hive QL queries on Hive tables for analytical functions.
  • Developed complex Hive UDFs to work with sequence files.
  • Written Pig UDF’s to cleanse the incoming huge data.
  • Designed and developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map Reduce outputs.
  • Installed and configured Tableau Desktop on one of the nodes to connect to the Hortonworks Hive Framework (Database) through the Hortonworks ODBC connector for further analytics of the cluster.
  • Created dashboards in Tableau to create meaningful metrics for decision making.
  • Performed rule checks on multiple file formats like XML, JSON, CSV and compressed file formats.
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Used storage format like AVRO to access multiple columnar data quickly in complex queries.
  • Implemented Counters for diagnosing problem in queries and for quality control and application-level statistics.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Implemented Log4j to trace logs and to track information.
  • Developed some helper class for abstracting Cassandra cluster connection act as core toolkit.
  • Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Hortonworks Data Platform (HDP) Distribution, Ambari, HDFS, MapReduce, Cassandra, Hive, Pig, Sqoop, Tableau, NoSQL, Shell Scripting, Maven, Git, Eclipse, Log4j, JUnit, Linux.

Confidential, Houston, TX

Hadoop/ETL Developer

Responsibilities:

  • Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
  • Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Responsible for Coding batch pipelines, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
  • Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Wrote Flume configuration files for importing streaming log data into HBase.
  • Performed masking on customer sensitive data using Flume interceptors.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Map reduce program and adding external jars for the Map-Reduce Program.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Installed Oozie workflow engine to run multiple Map Reduce jobs.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins

Environment: Hadoop, MapReduce, HDFS, Hive, Oozie, DynamoDB, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Teradata, Tomcat 6., Tableau.

Confidential

Java Developer

Responsibilities:

  • Involved in designing use-case diagrams, class diagram, interaction using UML model with Rational Rose.
  • Developed design patterns using MVC 2 Web Framework.
  • Implemented views using Struts tags, JSTL and Expression Language.
  • Used Spring for dependency injection plugging in the Hibernate DAO objects for the business layer.
  • Created Spring Interceptors to validate web service requests and enables notifications.
  • Integrated Hibernate ORM framework with Spring framework for data persistence and transaction management.
  • Designed REST APIs that allows sophisticated, effective and low-cost application Integration.
  • Wrote Python Scripts to parse XML documents and load the data into the database.
  • Worked with java core concepts like JVM internals, multithreading, garbage collection.
  • Implemented Java Message Services (JMS) using JMS API.
  • Adopted J2EE design patterns like Singleton, Service Locator and Business Facade
  • Developed POJO classes and used annotations to map with database tables
  • Used the features of Spring Core layer (IOC), Spring MVC, Spring AOP, Spring ORM layer and Spring DAO support layer to develop the application.
  • Involved in the configuration of Struts Framework, Spring Framework and Hibernate mapping tool
  • Used Jasper Reports for designing multiple reports.
  • Implemented web service client program to access Affiliates web service using SOAP/REST Web Services.
  • Involved in production support, resolving the production issues and maintaining the application server.
  • Utilized Agile Methodology/Scrum (SDLC) to manage projects and team.
  • Unit tested all the classes using JUNIT at various class level and methods level.
  • Worked with all the test cases with testing team and created test cases with use cases.

Environment: J2EE, Hibernate, JSF, Rational Rose, Spring1.2, JSP 2.0, Servlet 2.3, XML, JDBC, JNDI, JUnit, IBM WAS 6.0, RAD 7.0, Oracle 9i, PLSQL, Log4j, Linux.

Confidential

Java Developer

Responsibilities:

  • Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
  • Involved in the implementation of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
  • Developed front-end screens using Struts, JSP, HTML, AJAX, jQuery, JavaScript, JSON and CSS.
  • Implemented XSLT’s for transformations of the xml’s in the spring web flow.
  • Developed POJO based programming model using spring framework.
  • Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
  • Used Hibernate framework for Entity Relational Mapping.
  • Used Web Services to connect to mainframe for the validation of the data.
  • Created and maintained the configuration of Spring Application Framework(IOC) and implemented business logic using EJB3.
  • Developed Web Services utilizing HTTP, XML, XSL and SOAP.
  • SOAP has been used as a protocol to send request and response in the form of XML messages.
  • WSDL has been used to expose the Web Services.
  • Developed stored procedures, Triggers and functions to process the data using PL/SQL and mapped it to Hibernate Configuration File and established data integrity among all tables.
  • Involved in the up gradation of WebSphere and SQL Servers.
  • Participated in Code Reviews of other modules, documents, test cases.
  • Performed unit testing using JUnit and performance and volume testing.

Environment: Java1.5/J2EE, JDK, JSP, HTML, CSS, Struts, EJB, JMS, Spring, Hibernate, Eclipse, WebSphere Application Server, Web services (SOAP, REST), JavaScript, PL/SQL, CVS, RAD and Oracle10g.

We'd love your feedback!