Hadoop/spark Developer Resume
Cumming, GA
SUMMARY
- Over 8 years of experience in IT which includes working with Big Data ecosystem related technologies.
- Around 5 years of experience in Hadoop Development.
- Expertise with tools in Hadoop Ecosystem including HDFS, MapReduce, Hive, Sqoop, Pig, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent knowledge on Hadoop Architecture such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Strong experience on Hadoop distributions like Cloudera, MapR and Horton Works.
- Good Knowledge onHadoopCluster architecture and monitoring the cluster.
- Experience in developing MapReduce Programs using Apache Hadoopfor analyzing the big data as per the requirement.
- Highly capable of processing large sets of Structured, Semi - structured and Unstructured datasets supporting Big Data applications.
- Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation on Hive.
- Have good knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
- Used Zookeeper to provide coordination services to the cluster.
- Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Implemented indexing for logs from Oozie toElasticSearch.
- Analysis on integrating Kibana withElasticSearch.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using Scala.
- Experience in creating Spark Contexts, Spark SQL Contexts, and Spark Streaming Context to process huge sets of data.
- Worked with Big Data distributions like Cloudera with Cloudera Manager.
- Knowledge on Cloud technologies like AWS Cloud and Amazon Elastic Map Reduce (EMR).
- Proficient in using OOPs Concepts (Polymorphism, Inheritance, Encapsulation) etc.
- Good knowledge in advanced java topics such as Generics, Collections and multi-threading.
- Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, SQL Server and MySQL.
- Experience in Enterprise Service Bus(ESB) such as WebSphere Message Broker.
- Written unit test cases using JUnit and MR Unit for Map Reduce jobs.
- Experience with code development frameworks - GitHub, Jenkins.
- Expertise with Application servers and web servers like WebLogic, IBM WebSphere, Apache Tomcat, and VMware.
- Deployed and monitored scalable infrastructure on cloud environment Amazon web services (AWS)
- Knowledge aboutSplunkarchitecture and various components indexer, forwarder, search head, deployment server, Heavy and Universal forwarder, License model.
- Comprehensive knowledge of Software Development Life Cycle (SDLC), having thorough understanding of various phases like Requirements Analysis, Design, Development and Testing.
- Involved in the Software Life Cycle phases like Agile and Waterfall estimating the timelines for projects.
- Ability to quickly master new concepts and applications.
TECHNICAL SKILLS
Big Data Technologies: Hadoop (HDFS & MapReduce), PIG, HIVE, HBASE, ZOOKEEPER, Sqoop, Flume, Kafka, Spark, Spark Streaming, MLlib, Spark SQL and Data Frames, Graph X, Scala, Elastic Search and AWS
Programming & Scripting Languages: Java, C, SQL, Python, Impala, Scala, C++, ESQL, PHP
J2EE Technologies: JSP, SERVLETS, EJB, Angular JS
Web Technologies: HTML, JavaScript
Frameworks: Spring 3.5 - Spring MVC, Spring ORM, Spring Security, Spring ROO, Hibernate, Struts
Application Servers: IBM Web Sphere, JBoss, WebLogic
Web Servers: Apache Tomcat
Databases: MS SQL Server & SQL Server Integration Services(SSIS), My SQL, MongoDB, Cassandra, Oracle DB, Teradata
IDEs: Eclipse, Net Beans
Operating System: Unix, Windows, Ubuntu, Cent OS
Others: Putty, WinSCP, DataLake, Talend, Tableau, GitHub.
PROFESSIONAL EXPERIENCE
Hadoop/spark Developer
Confidential, Cumming, GA
Responsibilities:
- Setting up complete Hadoop Ecosystem for batch processing as well as real-time processing. Working on Hadoop Cloudera Cluster with 50 data nodes with RedHat Enterprise Linux
- Importing tera bites of data into HDFS from Relational Database Systems and vice-versa by making use of Sqoop.
- Developing ETL processes based on necessity, to load and analyze data from multiple data sources using MapReduce, Hive and Pig Latin Scripting.
- CreatingHiveTables, loading with data and writingHivequeries which will invoke and run Map Reduce jobs in the backend.
- Performance optimization of queries in hive by implementing partitioning and bucketing.
- Developing User Defined Functions for pig scripting to clean unstructured data and using MR jobs to clean and process data using Python.
- Using joins and groups when needed to optimize pig scripts.
- WritingHivejobs on processed data to parse and structure logs, manage and query data using HiveQl to facilitate effective querying.
- Analyzing large datasets to find patterns and insights within structured and unstructured data to help business intelligence with the help of Tableau.
- Integrating MapReduce with HBase to import huge clusters of data using MapReduce programs.
- Implementing several workflows using Apache Oozie framework to automate tasks.
- Used Zookeeper to co-ordinate and run different cluster services.
- Making use of Apache Impala wherever possible in place of Hive while analyzing data to achieve faster results.
- Implementing data ingestion and handling clusters in real time processing using Kafka.
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs, Python and Scala.
- Developed and Configured Kafka brokers to pipeline server logs data into spark streaming.
- Developed spark code and Spark-SQL/streaming for faster testing and processing of data.
- Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
- Coordinating with teams to resolve any errors and problems that arise technically as well as functionally.
Environment: Hadoop, Cloudera, Pig, Hive, Sqoop, Flume, Kafka, Spark, Storm, Tableau, HBase, Scala, Python, Kerberos, Agile, Zookeeper, Maven, AWS, MySQL.
Hadoop Developer
Confidential, Frisco, TX
Responsibilities:
- Design, implementation and deployment of Hadoop cluster.
- Providing solutions based on issues using big data analytics.
- Part of the team that built scalable distributed data solutions using Hadoop cluster environment using Horton Works distribution.
- Loading data into the Hadoop distributed file system (HDFS) with the help of Kafka and REST API
- Worked on Sqoop to load data into HDFS from Relational Database Management Systems.
- Implementation of Talend jobsto load and integrate data from excel sheets using Kafka.
- Developed custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data as per the requirement.
- Experience in developing Python for writing analytical jobs in Spark.
- Worked on ORC, Avro file formats and some compression techniques like LZO.
- Used Hive on top of structured data to implement dynamic partitions and bucketing of Hive tables.
- Carried out transforming huge data of Structured, Semi-Structured and Unstructured types and analyzing them using Hive queries and Pig scripts.
- Experienced in using Spark API with Hadoop YARN as execution engine for data analytics using Hive.
- In depth experience in migrating MapReduce programs into Spark transformations using Scala.
- Worked with MongoDB for developing and implementing programs in Hadoop Environment.
- Based on necessity, used job management scheduler Apache Oozie to execute the workflow.
- Implemented Ambari to keep track of node status, job progression and running analytical jobs in Hadoop clusters
- Expertise in Tableau to build customized graphical reports, charts and worksheets.
- Filter the dataset with PIG UDF, PIG scripts in HDFS and Storm/Bolt in Apache Storm.
Environment: Hadoop, Pig, Hive, HBase, Sqoop, Spark, Scala, Oozie, Zookeeper, RHEL, Java, Eclipse, SQL, NoSQL, Talend, Tableau, MongoDB.
Hadoop Developer
Confidential, Portland, ME
Responsibilities:
- Used Hadoop Cloudera Distribution. Involved in all phases of the Big Data Implementation including requirement analysis, design and development of Hadoop cluster.
- Using Spark RDD and Spark SQL to convert MapReduce jobs into Spark transformations by using data sets and Spark Data frames.
- Coding Scala for various Spark jobs to analyse customer data and sales history among other data.
- Collecting and aggregating huge sets of data using Apache Flume, staging data in Hadoop storage system HDFS for further analyzation.
- Design, build and support pipelines of data ingestion, transformation, conversion and validation.
- Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Performing different types joins on hive tables along with experience in partitioning, bucketing and collection concepts in Hive for efficient data access.
- Managing data between different databases like ingesting data into Cassandra and consuming the ingested data toHadoop.
- Creating Hive external tables to perform Extract, Transform and Load (ETL) operations on data that is generated on a daily basis.
- Creating HBase tables for random queries as requested by the business intelligence and other teams.
- Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
- Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
- Worked on NoSQL databases including HBase and Cassandra.
- Participated in development/implementation of Cloudera impala Hadoop environment.
- Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
- Developed the data model to manage the summarized data.
Environment: Hadoop, Cloudera Hive, Java, Python, Parquet, Oozie, Cassandra, Zookeeper, HiveQl/SQL, MongoDB, Tableau, Impala.
Network Engineer
Confidential
Responsibilities:
- Establishing networking environment by designing system configuration and directing system installation.
- Enforcing system standards and defining protocols.
- Maximizing network performance by monitoring and troubleshooting network problems and outages.
- Setting up policies for data security and network optimization
- Maintaining the clusters needed for data processing especially for big data where a bunch of servers are setup on a complex yet efficient network.
- Reporting network operational status by gathering, filtering and prioritizing information necessary for the optimal network upkeep.
- Keeping the budget low by efficiently making use of the available resources and tracking the data transfer speeds and processing speeds
- Tracking Budget Expenses, Project Management, Problem Solving, LAN Knowledge, Proxy Servers, Networking Knowledge, Network Design and Implementation, Network Troubleshooting, Network Hardware Configuration, Network Performance Tuning, People Management
Environment: Wireshark, GNS3, Hadoop, IP addressing, VPN, VLAN, Network Protocols.
Jr. Java Developer
Confidential
Responsibilities:
- Analyzing requirements and specifications in Agile based environment.
- Development of web interface for User module and Admin module using JSP, HTML, XML, CSS, JavaScript, AJAX, and Action Servlets with Struts Framework.
- Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing the data from GUI Layer to Business Layer).
- Analyzation and design of the system based on OOAD principle.
- Used WebSphere Application Server to deploy the build.
- Development, Testing and Debugging of the developed application in Eclipse.
- Used DOM Parser to parse the XML files.
- Log4j framework has been used for logging debug, info & error data.
- Used Oracle 10g Database for data persistence.
- Transferring of files from local system to other systems is done using WinSCP.
- Performed Test Driven Development (TDD) using JUnit.
Environment: J2EE, HTML, JSON, JavaScript, CSS, Struts, Spring, Hibernate, Eclipse, Oracle 10g, SQL, XML, CVS, HTML.