Sr. Hadoop Developer Resume
Merrimack, NH
PROFESSIONAL SUMMARY:
- 8+ years of Professional Experience in IT industry, which includes 4 years of experience with Big Data and Hadoop Eco Systems and strong programming experience using Java, Scala, Python, PHP and SQL.
- Solid hands - on experience with Hadoop ecosystem components like Spark, Hive, Impala, MapReduce, Pig, HBase, Sqoop, NiFi, Kafka, Yarn and Oozie.
- Strong fundamental understanding of Distributed Systems Architecture and parallel processing frameworks.
- Strong experience designing and implementing end to end data pipelines running on terabytes of data.
- Used Spark and Storm extensively to perform data transformations, data validations and data aggregations.
- Hands on Experience on Data Ingestion tools like Apache Sqoop for importing and exporting data to Relational data base systems (RDBMS) and vice-versa.
- Experience in Apache NIFI which is a Hadoop technology and Integrating Apache NIFI and Apache Kafka
- Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
- Good knowledge and development experience with using MapReduce framework.
- Proficient in creating Hive DDL's, writing Hive custom UDF’s.
- Experience designing Oozie workflows to schedule and manage data flow.
- Good experience is designing and implementing end to end Data Security and Governance within Hadoop Platform using Kerberos.
- Experience in working with NoSQL database like HBase, Cassandra and Mongo DB.
- Experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
- Experience working with various Hadoop Distributions like Cloudera, Amazon AWS and Hortonworks distributions.
- Created Talend Mappings to populate the data into dimensions and fact tables.
- Experience in Apache Spark Core, Spark SQL, Spark Streaming, Spark ML.
- Experience in using different columnar file formats like Avro, ORC and Parquet formats.
- Experience in working with the integration of Hadoop with Amazon S3, Redshift.
- Good experience in Object Oriented Programming, using Java & J2EE (Servlets, JSP, Java Beans, EJB, JDBC, RMI, XML, JMS, Web Services, AJAX).
- Proficiency in frameworks like Struts, Spring, Hibernate.
- Expertise in working with RDBMS databases like Oracle and DB2.
- Experience in Database design, Database analysis, Entity relationships, Programming SQL.
- Strong expertise in creating Shell-Scripts, Regular Expressions and Cron Job Automation.
- Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
- Worked with geographically distributed and culturally diverse team, including roles that involve interaction with clients and team members.
TECHNICAL SKILLS:
Big Data Eco System: Hadoop, HDFS, MapReduce, Hive, Pig, Impala, HBase, Sqoop, NoSQL (HBase, Cassandra), Spark, Spark SQL, Spark Streaming, Zookeeper, Oozie, NiFi, Kafka, Flume, Hue, Cloudera Manager, Ambari, Amazon AWS, Hortonworks clusters
Java/J2EE & Web Technologies: J2EE, JMS, JSF, Servlets, HTML, CSS, XML, XHTML, AJAX, Angular JS, JSP, JSTL
Languages: C, C++, Java, Shell Scripting, PL/SQL, Python, Pig Latin, Scala
Scripting Languages: JavaScript and UNIX Shell Scripting, Python
Operating system: Windows, MacOS, Linux and Unix
Design: UML, Rational Rose, Microsoft Visio, E-R Modelling
DBMS / RDBMS: Oracle 11g/10g/9i, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, RDBMS, MongoDB, Cassandra, HBase
IDE and Build Tools: Eclipse, NetBeans, Microsoft Visual Studio, Ant, Jenkins, Docker, Maven, JIRA, Confluence
Version Control: SVN, CVS, GITHUB
Security: Kerberos
Web Services: SOAP, RESTful, JAX-WS
Web Servers: WebLogic, WebSphere, Apache Tomcat, Jetty
PROFESSIONAL EXPERIENCE:
Confidential, Merrimack, NH
Sr. Hadoop Developer
Responsibilities:
- Developed new platform using Hadoop for performing user behavioral analytics.
- Ingested customer profile information from data warehouse into HDFS using Sqoop
- Developed custom connectors for pulling marketing and campaign data feeds from FTP servers into HDFS.
- Performed Data Ingestion from multiple internal clients exposed as Rest calls using Apache Kafka.
- Created Kafka producers for streaming real time click stream events from adobe Rest services into our topics.
- Developed Spark streaming applications for consuming the data from Kafka topics.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Analyzed the data using Spark Data Frames and series of Hive Scripts to produce summarized results from Hadoop to downstream systems.
- Used spark SQL to load the metrics from the summarized results to hive tables in parquet format.
- Implemented Python Scripts for Auto Deployments in AWS.
- Worked with Spark Data Frames, Spark SQL and Spark MLlib extensively.
- Worked with a team to improve the performance and optimization of the existing algorithms in Hadoop using Spark, Spark -SQL, Data Frame.
- Implemented Apache Storm Spouts, bolts to process data by creating topologies.
- Implemented business logic in Hive and written UDF’s to process the data for analysis.
- Implemented security on Hadoop Cluster using with Kerberos by working with operations team to move from a non-secured cluster to secured cluster.
- Created Hive external tables on top of the HDFS data.
- Used Cloudera Manager to manage and monitor Hadoop Stack.
- Used Oozie to define a workflow to coordinate the execution of Spark, Hive and Sqoop jobs.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
- Developed traits and case classes in Scala.
- Setup Jenkins on AWS EC2 servers and configured the notification server to Jenkin server for any changes to the repository.
- Used Impala to perform interactive querying.
- Developed interactive Dashboards using Tableau connecting to Impala.
- Worked with Data Science team in developing Spark MLlib applications to develop various predictive models.
- Used Jira as an issue tracking tool for design and documentation of run time problems and procedures.
- Expertise on interacting with the project team to organize timelines, responsibilities and deliverables to provide all aspects of technical support.
- Coordinated effectively with offshore team and managed project deliverable on time.
Environment: Hadoop 2.x, Spark, Scala, Hive, Sqoop, Oozie, Kafka, Cloudera Manager, Storm, ZooKeeper, HBase, Impala, YARN, Cassandra, JIRA, Kerberos, Shell Scripting, SBT, GITHUB, Maven.
Confidential, Rockville, MD
Hadoop/Spark Developer
Responsibilities:
- Involved in requirement analysis, design, coding and implementation phases of the project.
- Used Sqoop to load structured data from relational databases into HDFS.
- Loaded transactional data from Teradata using Sqoop and created Hive Tables.
- Worked on automation of delta feeds from Teradata using Sqoop and from FTP Servers to Hive.
- Set up Apache NiFi to transfer structured and streaming data into HDFS.
- Experience working with NiFi in multi-tenant authorization.
- Worked with different compression codecs like GZIP, SNAPPY and BZIP2 in MapReduce, Pig and Hive for better performance.
- Developed Spark codes using Spark-SQL for faster processing of data
- Performed Transformations like De-normalizing, cleansing of data sets, Date Transformations, parsing some complex columns.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Handled Avro, JSON and Apache Log data in Hive using custom Hive SerDes.
- Worked on batch processing and scheduled workflows using Oozie.
- Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
- Implemented Spark batch applications using Scala for performing various kinds of cleansing, de-normalization and aggregations.
- Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run Map-reduce and Spark.
- Used Hive-QL to create partitioned RC, ORC tables, used compression techniques to optimize data process and faster retrieval.
- Implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
Environment: HDFS, Hadoop, NiFi, Kafka, Spark, Pig, Hive, HBase, Sqoop, Teradata, Flume, Map Reduce, Oozie, Java 6/7, Oracle 10g, YARN, UNIX Shell Scripting, Maven, Agile Methodology, JIRA, Linux.
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Developed complex MapReduce jobs in Java to perform data extraction, aggregation and transformation
- Load the data into HDFS from different Data sources like Oracle, DB2 using Sqoop and load into Hive tables.
- Analyzed big data sets by running Hive queries and Pig scripts.
- Integrated the hive warehouse with HBase for information sharing among teams.
- Developed the Sqoop scripts for the interaction between Pig and MySQL Database.
- Worked on Static and Dynamic partitioning and Bucketing in Hive.
- Scripted complex Hive QL queries on Hive tables for analytical functions.
- Developed complex Hive UDFs to work with sequence files.
- Written Pig UDF’s to cleanse the incoming huge data.
- Designed and developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map Reduce outputs.
- Installed and configured Tableau Desktop on one of the nodes to connect to the Hortonworks Hive Framework (Database) through the Hortonworks ODBC connector for further analytics of the cluster.
- Created dashboards in Tableau to create meaningful metrics for decision making.
- Performed rule checks on multiple file formats like XML, JSON, CSV and compressed file formats.
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Used storage format like AVRO to access multiple columnar data quickly in complex queries.
- Implemented Counters for diagnosing problem in queries and for quality control and application-level statistics.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Implemented Log4j to trace logs and to track information.
- Developed some helper class for abstracting Cassandra cluster connection act as core toolkit.
- Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: Hortonworks Data Platform (HDP) Distribution, Ambari, HDFS, MapReduce, Cassandra, Hive, Pig, Sqoop, Tableau, NoSQL, Shell Scripting, Maven, Git, Eclipse, Log4j, JUnit, Linux.
Confidential, Houston, TX
Hadoop/ETL Developer
Responsibilities:
- Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
- Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Responsible for Coding batch pipelines, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
- Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
- Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Wrote Flume configuration files for importing streaming log data into HBase.
- Performed masking on customer sensitive data using Flume interceptors.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
- Map reduce program and adding external jars for the Map-Reduce Program.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Installed Oozie workflow engine to run multiple Map Reduce jobs.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins
Environment: Hadoop, MapReduce, HDFS, Hive, Oozie, DynamoDB, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Teradata, Tomcat 6., Tableau.
Confidential
Java Developer
Responsibilities:
- Involved in designing use-case diagrams, class diagram, interaction using UML model with Rational Rose.
- Developed design patterns using MVC 2 Web Framework.
- Implemented views using Struts tags, JSTL and Expression Language.
- Used Spring for dependency injection plugging in the Hibernate DAO objects for the business layer.
- Created Spring Interceptors to validate web service requests and enables notifications.
- Integrated Hibernate ORM framework with Spring framework for data persistence and transaction management.
- Designed REST APIs that allows sophisticated, effective and low-cost application Integration.
- Wrote Python Scripts to parse XML documents and load the data into the database.
- Worked with java core concepts like JVM internals, multithreading, garbage collection.
- Implemented Java Message Services (JMS) using JMS API.
- Adopted J2EE design patterns like Singleton, Service Locator and Business Facade
- Developed POJO classes and used annotations to map with database tables
- Used the features of Spring Core layer (IOC), Spring MVC, Spring AOP, Spring ORM layer and Spring DAO support layer to develop the application.
- Involved in the configuration of Struts Framework, Spring Framework and Hibernate mapping tool
- Used Jasper Reports for designing multiple reports.
- Implemented web service client program to access Affiliates web service using SOAP/REST Web Services.
- Involved in production support, resolving the production issues and maintaining the application server.
- Utilized Agile Methodology/Scrum (SDLC) to manage projects and team.
- Unit tested all the classes using JUNIT at various class level and methods level.
- Worked with all the test cases with testing team and created test cases with use cases.
Environment: J2EE, Hibernate, JSF, Rational Rose, Spring1.2, JSP 2.0, Servlet 2.3, XML, JDBC, JNDI, JUnit, IBM WAS 6.0, RAD 7.0, Oracle 9i, PLSQL, Log4j, Linux.
Confidential
Java Developer
Responsibilities:
- Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
- Involved in the implementation of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
- Developed front-end screens using Struts, JSP, HTML, AJAX, jQuery, JavaScript, JSON and CSS.
- Implemented XSLT’s for transformations of the xml’s in the spring web flow.
- Developed POJO based programming model using spring framework.
- Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
- Used Hibernate framework for Entity Relational Mapping.
- Used Web Services to connect to mainframe for the validation of the data.
- Created and maintained the configuration of Spring Application Framework(IOC) and implemented business logic using EJB3.
- Developed Web Services utilizing HTTP, XML, XSL and SOAP.
- SOAP has been used as a protocol to send request and response in the form of XML messages.
- WSDL has been used to expose the Web Services.
- Developed stored procedures, Triggers and functions to process the data using PL/SQL and mapped it to Hibernate Configuration File and established data integrity among all tables.
- Involved in the up gradation of WebSphere and SQL Servers.
- Participated in Code Reviews of other modules, documents, test cases.
- Performed unit testing using JUnit and performance and volume testing.
Environment: Java1.5/J2EE, JDK, JSP, HTML, CSS, Struts, EJB, JMS, Spring, Hibernate, Eclipse, WebSphere Application Server, Web services (SOAP, REST), JavaScript, PL/SQL, CVS, RAD and Oracle10g.