Hadoop/spark Developer Resume
Englewood, CO
PROFESSIONAL SUMMARY:
- Having over 8 years of professional IT experience in Analysis, Development, Integration and Maintenance of Web based and Client/Server applications using Java and Big Data technologies.
- 4 years of relevant experience in Hadoop Ecosystem and architecture (HDFS, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Flume, Oozie).
- Experience in real time analytics wif Apache Spark (RDD, DataFrames and Streaming API).
- Used Spark DataFrames API over Cloudera platform to perform analytics on Hive data.
- Experience in integrating Hadoop wif Apache Storm and Kafka. Expertise in uploading Click stream data from Kafka to HDFS, HBase and Hive by integrating wif Storm.
- Developed producers for Kafka which compress and bind many small files into a larger Avro and Sequence files before writing to HDFS to make best use of Hadoop block size.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Configured Flume to extract teh data from teh wseb server output files to load into HDFS.
- Extensive hands on experience in writing MapReduce jobs in Java.
- Performed data analysis using Hive and Pig. Experience in analyzing large datasets using HiveQL and PigLatin.
- Experience in using Partitioning and Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into PigLatin and HQL (HiveQL) and Used UDFs from Piggybank UDF Repository.
- Good understanding and noledge of NoSQL databases like MongoDB, Cassandra and HBase.
- Having Experience in monitoring and managing teh Hadoop cluster usingCloudera Manager.
- Experience in job work-flow scheduling and monitoring using Oozie wif Python scripting.
- Worked extensively on differentHadoopdistributions like Cloudera’s CDH and Hortonworks HDP.
- Good working noledge in cloud integration wif Amazon Web Services (AWS) components like Redshift, DynamoDB, EMR, S3 and EC2 instances.
- Worked wif ApacheNiFito develop Custom Processors for processing and distributing data among cloud systems.
- Having good noledge of Scala programming concepts.
- Expertise in distributed and web environments focused in Core Java technologies like Collections, Multithreading, IO, Exception Handling and Memory Management.
- Expertise in development of Web applications using J2EE technologies like Servlets, JSP, Web Services, Spring, Hibernate, HTML5, JavaScript, jQuery, AJAX etc.,
- Knowledge of standard build and deployment tools such as Eclipse, Scala IDE, Maven, Subversion, SBT.
- Extensive noledge in Software Development Lifecycle (SDLC) using Waterfall, Agile methodologies.
- Facilitate Sprint planning, daily scrums, retrospectives, stakeholder meetings, and software demonstrations.
- Excellent communication skills wif teh ability to communicate complex issues to technical and non-technical audiences that includes peers, partners, and Senior IT and Business management.
TECHNICAL SKILLS:
Languages: Java, XML, SQL, PL/SQL, Pig Latin, Hive QL, Python, Scala
Web Technologies: JEE (JDBC, JSP, SERVLET, JSF, JSTL), AJAX, JavaScript
Big Data Systems: Hadoop, HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Flume, Oozie, Impala, Spark, Kafka, Storm
RDBMS: Oracle, MySQL, SQL Server, PostgreSQL, Teradata
NoSQL Databases: HBase, MongoDB, Cassandra
App/Web Servers: Apache Tomcat, WebLogic
SOA: Web services, SOAP, REST
Frameworks: Struts 2, Hibernate, Spring 3.x
Version Control Systems: GIT, CVS, SVN
IDEs: Eclipse, Scala IDE, NetBeans, IntelliJ IDEA, PyCharm
Operating Systems: UNIX, Linux, Windows
PROFESSIONAL EXPERIENCE
Confidential - Englewood, CO
Hadoop/Spark Developer
Responsibilities:
- Created and worked on Sqoop jobs wif incremental load to populate Hive External tables.
- Designed and developed Hive tables to store staging and historical data.
- Created Hive tables as per requirement, internal and external tables are defined wif appropriate static and dynamic partitions, intended for efficiency.
- Experience in using ORC file format wif Snappy compression for optimized storage of Hive tables.
- Solved performance issues in Hive and Pig scripts wif understanding of Joins, Group and aggregation and used them using Impala process engine
- Created Oozie workflows for sqoop to migrate teh data from source to HDFS and then to target tables.
- Developed Oozie workflow for scheduling and orchestrating teh ETL process.
- Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and DataFrames API to load structured and semi-structured data into Spark clusters.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Configured Flume to transport web server logs into HDFS.
- Experience on Amazon Web Services(AWS), Amazon Cloud Services like Elastic Compute Cloud(EC2), Simple Storage Service(S3), Elastic Map Reduce(EMR), Amazon Simple DB and Amazon Cloud Watch.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Used Apache Kafka for importing real time network log data into HDFS.
- Created web-based User interface for creating, monitoring and controlling data flows using Apache Nifi.
Environment: Apache Hadoop, CDH 4.7, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Scala, Spark, Spark Streaming, Kafka, Linux
Confidential - Madison, WI
Hadoop Developer
Responsibilities:
- Extracted teh data from Teradata into HDFS using Sqoop.
- Used Flume to collect, aggregate and store teh web log data from different sources like web servers, mobile and network devices and pushed into HDFS.
- Implemented MapReduce programs on log data to transform into structured way to find user information.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Analyzed teh web log data using teh HiveQL to extract number of unique visitors per day, page views and visit duration.
- Utilized Flume to filter out teh JSON input data read from teh web servers to retrieve only teh required data needed to perform analytics.
- Developed UDF functions for Hive and wrote complex queries in Hive for data analysis.
- Developed a well-structured and efficient ad-hoc environment for functional users.
- Export teh analyzed data to relational databases using Sqoop for visualizations and to generate reports for teh BI team.
- Loaded cache data into HBase using Sqoop.
- Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif Pig.
- Extensive work inETLprocess consisting of data transformation, data sourcing, mapping, conversion and loading using Informatica.
- Extensively usedETLprocesses to load data from flat files into teh target database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Created Talend ETL jobs to read teh data from Oracle Database and import in HDFS.
- Worked on data serialization formats for converting complex objects into sequence bits by using Avro, RC and ORC file formats.
Environment: Apache Hadoop, Hortonworks HDP 2.0, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Teradata, Talend, Avro, Java, Linux
Confidential - Omaha, NE
Hadoop Developer
Responsibilities:
- Worked on live 8 node Hadoop cluster running CDH 4.
- Used Sqoop to import teh data from RDBMS to Hadoop Distributed File System (HDFS).
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports by Business Intelligence tools.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Developed several MapReduce programs to analyze and transform teh data to uncover insights into teh customer usage patterns.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing teh data into HDFS.
- Responsible for creating Hive External tables and loaded teh data into tables and query data using HiveQL.
- Used Hive data warehouse tool to analyze teh unified historic data in HDFS to identify issues and behavioral patterns.
- Created concurrent access for Hive tables wif shared and exclusive locking that can be enabled in Hive wif teh halp of Zookeeper implementation in teh cluster.
- Integrated Oozie wif teh rest of Hadoop stack supporting several types of Hadoop jobs as well as teh system specific jobs (such as Java programs and shell scripts).
- Created HBase tables to store various data formats coming from different portfolios, worked on NoSQL databases including HBase, Cassandra and MongoDB.
- Used Jenkins for build and continuous integration for software development.
- Worked wif application teams to install Operating systems, Hadoop updates, patches and version upgrades as required.
Environment: Apache Hadoop, CDH 4, Sqoop, Flume, MapReduce, Pig, Hive, HBase, Cassandra, MongoDB, Oozie, Zookeeper, Jenkins
Confidential
Java Developer
Responsibilities:
- Involved in development of business domain concepts into Use cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams.
- Implemented various J2EE Design patterns such as Model-View-Controller (MVC), Data Access Object, Business Delegate and Transfer Object.
- Involved in designing and development of project using Java/J2EE technologies by following MVC architecture of which JSPs are views and Servers as controllers.
- Involved in configuring Struts, Tiles and developing teh configuration files.
- Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
- Developed and deployed UI layer logics using JSP, XML, JavaScript, HTML/DHTML.
- Using Star UML designed network and use case diagrams to monitor teh workflow.
- Wrote Server side programs to handle requests coming from different types of devices using RESTful Web Services.
- Designed a light weight model for teh product using Inversion of Control TEMPprincipal and implemented it successfully using Spring IOC Container.
- Used Hibernate ORM tool to store and retrieve teh data from PostgreSQL database.
- Provided connections using JDBC to teh database and developed SQL queries to manipulate teh data.
Environment: Java J2EE, Struts MVC, Tiles, JSP, XML, JavaScript, Spring IOC, Websphere Application Server, PostgreSQL
Confidential
Java Developer
Responsibilities:
- Work involved providing support to teh production environment for various applications and actively work on incidents and issues raised by users. This also involved in on call support during off hours.
- Developed service layer logic for core modules using JSPs and Servlets and involved in integration wif presentation layer.
- Involved in complete project such as Business Delegate, Data Transfer Object, Service Locator, Data Access Object and Singleton.
- Developed XML configuration and data description using Hibernate. Hibernate Transaction Manager is used to maintain teh transaction persistence.
- Developed teh user interface using JSP and DHTML lifecycle of teh project from gathering business requirements to creating an architecture and build applications on Java/J2EE wif Spring MVC framework.
- Involved in fixing bugs and minor enhancements for teh front-end module.
Environment: Java, Servlets, JSP, Spring, Hibernate, XML, XPath, jQuery, JavaScript, WebSphere Application Server
