Sr. Hadoop Developer Resume Merrimack, NH - Hire IT People

PROFESSIONAL SUMMARY:

8+ years of Professional Experience in IT industry, which includes 4 years of experience with Big Data and Hadoop Eco Systems and strong programming experience using Java, Scala, Python, PHP and SQL.
Solid hands - on experience with Hadoop ecosystem components like Spark, Hive, Impala, MapReduce, Pig, HBase, Sqoop, NiFi, Kafka, Yarn and Oozie.
Strong fundamental understanding of Distributed Systems Architecture and parallel processing frameworks.
Strong experience designing and implementing end to end data pipelines running on terabytes of data.
Used Spark and Storm extensively to perform data transformations, data validations and data aggregations.
Hands on Experience on Data Ingestion tools like Apache Sqoop for importing and exporting data to Relational data base systems (RDBMS) and vice-versa.
Experience in Apache NIFI which is a Hadoop technology and Integrating Apache NIFI and Apache Kafka
Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
Good knowledge and development experience with using MapReduce framework.
Proficient in creating Hive DDL's, writing Hive custom UDF’s.
Experience designing Oozie workflows to schedule and manage data flow.
Good experience is designing and implementing end to end Data Security and Governance within Hadoop Platform using Kerberos.
Experience in working with NoSQL database like HBase, Cassandra and Mongo DB.
Experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
Experience working with various Hadoop Distributions like Cloudera, Amazon AWS and Hortonworks distributions.
Created Talend Mappings to populate the data into dimensions and fact tables.
Experience in Apache Spark Core, Spark SQL, Spark Streaming, Spark ML.
Experience in using different columnar file formats like Avro, ORC and Parquet formats.
Experience in working with the integration of Hadoop with Amazon S3, Redshift.
Good experience in Object Oriented Programming, using Java & J2EE (Servlets, JSP, Java Beans, EJB, JDBC, RMI, XML, JMS, Web Services, AJAX).
Proficiency in frameworks like Struts, Spring, Hibernate.
Expertise in working with RDBMS databases like Oracle and DB2.
Experience in Database design, Database analysis, Entity relationships, Programming SQL.
Strong expertise in creating Shell-Scripts, Regular Expressions and Cron Job Automation.
Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
Worked with geographically distributed and culturally diverse team, including roles that involve interaction with clients and team members.

TECHNICAL SKILLS:

Big Data Eco System: Hadoop, HDFS, MapReduce, Hive, Pig, Impala, HBase, Sqoop, NoSQL (HBase, Cassandra), Spark, Spark SQL, Spark Streaming, Zookeeper, Oozie, NiFi, Kafka, Flume, Hue, Cloudera Manager, Ambari, Amazon AWS, Hortonworks clusters

Java/J2EE & Web Technologies: J2EE, JMS, JSF, Servlets, HTML, CSS, XML, XHTML, AJAX, Angular JS, JSP, JSTL

Languages: C, C++, Java, Shell Scripting, PL/SQL, Python, Pig Latin, Scala

Scripting Languages: JavaScript and UNIX Shell Scripting, Python

Operating system: Windows, MacOS, Linux and Unix

Design: UML, Rational Rose, Microsoft Visio, E-R Modelling

DBMS / RDBMS: Oracle 11g/10g/9i, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, RDBMS, MongoDB, Cassandra, HBase

IDE and Build Tools: Eclipse, NetBeans, Microsoft Visual Studio, Ant, Jenkins, Docker, Maven, JIRA, Confluence

Version Control: SVN, CVS, GITHUB

Security: Kerberos

Web Services: SOAP, RESTful, JAX-WS

Web Servers: WebLogic, WebSphere, Apache Tomcat, Jetty

PROFESSIONAL EXPERIENCE:

Confidential, Merrimack, NH

Sr. Hadoop Developer

Responsibilities:

Developed new platform using Hadoop for performing user behavioral analytics.
Ingested customer profile information from data warehouse into HDFS using Sqoop
Developed custom connectors for pulling marketing and campaign data feeds from FTP servers into HDFS.
Performed Data Ingestion from multiple internal clients exposed as Rest calls using Apache Kafka.
Created Kafka producers for streaming real time click stream events from adobe Rest services into our topics.
Developed Spark streaming applications for consuming the data from Kafka topics.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
Analyzed the data using Spark Data Frames and series of Hive Scripts to produce summarized results from Hadoop to downstream systems.
Used spark SQL to load the metrics from the summarized results to hive tables in parquet format.
Implemented Python Scripts for Auto Deployments in AWS.
Worked with Spark Data Frames, Spark SQL and Spark MLlib extensively.
Worked with a team to improve the performance and optimization of the existing algorithms in Hadoop using Spark, Spark -SQL, Data Frame.
Implemented Apache Storm Spouts, bolts to process data by creating topologies.
Implemented business logic in Hive and written UDF’s to process the data for analysis.
Implemented security on Hadoop Cluster using with Kerberos by working with operations team to move from a non-secured cluster to secured cluster.
Created Hive external tables on top of the HDFS data.
Used Cloudera Manager to manage and monitor Hadoop Stack.
Used Oozie to define a workflow to coordinate the execution of Spark, Hive and Sqoop jobs.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
Developed traits and case classes in Scala.
Setup Jenkins on AWS EC2 servers and configured the notification server to Jenkin server for any changes to the repository.
Used Impala to perform interactive querying.
Developed interactive Dashboards using Tableau connecting to Impala.
Worked with Data Science team in developing Spark MLlib applications to develop various predictive models.
Used Jira as an issue tracking tool for design and documentation of run time problems and procedures.
Expertise on interacting with the project team to organize timelines, responsibilities and deliverables to provide all aspects of technical support.
Coordinated effectively with offshore team and managed project deliverable on time.

Environment: Hadoop 2.x, Spark, Scala, Hive, Sqoop, Oozie, Kafka, Cloudera Manager, Storm, ZooKeeper, HBase, Impala, YARN, Cassandra, JIRA, Kerberos, Shell Scripting, SBT, GITHUB, Maven.

Confidential, Rockville, MD

Hadoop/Spark Developer

Responsibilities:

Involved in requirement analysis, design, coding and implementation phases of the project.
Used Sqoop to load structured data from relational databases into HDFS.
Loaded transactional data from Teradata using Sqoop and created Hive Tables.
Worked on automation of delta feeds from Teradata using Sqoop and from FTP Servers to Hive.
Set up Apache NiFi to transfer structured and streaming data into HDFS.
Experience working with NiFi in multi-tenant authorization.
Worked with different compression codecs like GZIP, SNAPPY and BZIP2 in MapReduce, Pig and Hive for better performance.
Developed Spark codes using Spark-SQL for faster processing of data
Performed Transformations like De-normalizing, cleansing of data sets, Date Transformations, parsing some complex columns.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Handled Avro, JSON and Apache Log data in Hive using custom Hive SerDes.
Worked on batch processing and scheduled workflows using Oozie.
Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
Implemented Spark batch applications using Scala for performing various kinds of cleansing, de-normalization and aggregations.
Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run Map-reduce and Spark.
Used Hive-QL to create partitioned RC, ORC tables, used compression techniques to optimize data process and faster retrieval.
Implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.

Environment: HDFS, Hadoop, NiFi, Kafka, Spark, Pig, Hive, HBase, Sqoop, Teradata, Flume, Map Reduce, Oozie, Java 6/7, Oracle 10g, YARN, UNIX Shell Scripting, Maven, Agile Methodology, JIRA, Linux.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

Developed complex MapReduce jobs in Java to perform data extraction, aggregation and transformation
Load the data into HDFS from different Data sources like Oracle, DB2 using Sqoop and load into Hive tables.
Analyzed big data sets by running Hive queries and Pig scripts.
Integrated the hive warehouse with HBase for information sharing among teams.
Developed the Sqoop scripts for the interaction between Pig and MySQL Database.
Worked on Static and Dynamic partitioning and Bucketing in Hive.
Scripted complex Hive QL queries on Hive tables for analytical functions.
Developed complex Hive UDFs to work with sequence files.
Written Pig UDF’s to cleanse the incoming huge data.
Designed and developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map Reduce outputs.
Installed and configured Tableau Desktop on one of the nodes to connect to the Hortonworks Hive Framework (Database) through the Hortonworks ODBC connector for further analytics of the cluster.
Created dashboards in Tableau to create meaningful metrics for decision making.
Performed rule checks on multiple file formats like XML, JSON, CSV and compressed file formats.
Monitored System health and logs and respond accordingly to any warning or failure conditions.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Used storage format like AVRO to access multiple columnar data quickly in complex queries.
Implemented Counters for diagnosing problem in queries and for quality control and application-level statistics.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
Implemented Log4j to trace logs and to track information.
Developed some helper class for abstracting Cassandra cluster connection act as core toolkit.
Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Hortonworks Data Platform (HDP) Distribution, Ambari, HDFS, MapReduce, Cassandra, Hive, Pig, Sqoop, Tableau, NoSQL, Shell Scripting, Maven, Git, Eclipse, Log4j, JUnit, Linux.

Confidential, Houston, TX

Hadoop/ETL Developer

Responsibilities:

Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
Importing and exporting data into HDFS and Hive using Sqoop.
Responsible for Coding batch pipelines, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
Wrote Flume configuration files for importing streaming log data into HBase.
Performed masking on customer sensitive data using Flume interceptors.
Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
Map reduce program and adding external jars for the Map-Reduce Program.
Involved in loading data from UNIX file system to HDFS.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Installed Oozie workflow engine to run multiple Map Reduce jobs.
Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins

Environment: Hadoop, MapReduce, HDFS, Hive, Oozie, DynamoDB, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Teradata, Tomcat 6., Tableau.

Confidential

Java Developer

Responsibilities:

Involved in designing use-case diagrams, class diagram, interaction using UML model with Rational Rose.
Developed design patterns using MVC 2 Web Framework.
Implemented views using Struts tags, JSTL and Expression Language.
Used Spring for dependency injection plugging in the Hibernate DAO objects for the business layer.
Created Spring Interceptors to validate web service requests and enables notifications.
Integrated Hibernate ORM framework with Spring framework for data persistence and transaction management.
Designed REST APIs that allows sophisticated, effective and low-cost application Integration.
Wrote Python Scripts to parse XML documents and load the data into the database.
Worked with java core concepts like JVM internals, multithreading, garbage collection.
Implemented Java Message Services (JMS) using JMS API.
Adopted J2EE design patterns like Singleton, Service Locator and Business Facade
Developed POJO classes and used annotations to map with database tables
Used the features of Spring Core layer (IOC), Spring MVC, Spring AOP, Spring ORM layer and Spring DAO support layer to develop the application.
Involved in the configuration of Struts Framework, Spring Framework and Hibernate mapping tool
Used Jasper Reports for designing multiple reports.
Implemented web service client program to access Affiliates web service using SOAP/REST Web Services.
Involved in production support, resolving the production issues and maintaining the application server.
Utilized Agile Methodology/Scrum (SDLC) to manage projects and team.
Unit tested all the classes using JUNIT at various class level and methods level.
Worked with all the test cases with testing team and created test cases with use cases.

Environment: J2EE, Hibernate, JSF, Rational Rose, Spring1.2, JSP 2.0, Servlet 2.3, XML, JDBC, JNDI, JUnit, IBM WAS 6.0, RAD 7.0, Oracle 9i, PLSQL, Log4j, Linux.

Confidential

Java Developer

Responsibilities:

Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
Involved in the implementation of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
Developed front-end screens using Struts, JSP, HTML, AJAX, jQuery, JavaScript, JSON and CSS.
Implemented XSLT’s for transformations of the xml’s in the spring web flow.
Developed POJO based programming model using spring framework.
Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
Used Hibernate framework for Entity Relational Mapping.
Used Web Services to connect to mainframe for the validation of the data.
Created and maintained the configuration of Spring Application Framework(IOC) and implemented business logic using EJB3.
Developed Web Services utilizing HTTP, XML, XSL and SOAP.
SOAP has been used as a protocol to send request and response in the form of XML messages.
WSDL has been used to expose the Web Services.
Developed stored procedures, Triggers and functions to process the data using PL/SQL and mapped it to Hibernate Configuration File and established data integrity among all tables.
Involved in the up gradation of WebSphere and SQL Servers.
Participated in Code Reviews of other modules, documents, test cases.
Performed unit testing using JUnit and performance and volume testing.

Environment: Java1.5/J2EE, JDK, JSP, HTML, CSS, Struts, EJB, JMS, Spring, Hibernate, Eclipse, WebSphere Application Server, Web services (SOAP, REST), JavaScript, PL/SQL, CVS, RAD and Oracle10g.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Merrimack, NH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship