We provide IT Staff Augmentation Services!

Hadoop Spark Developer. Resume

2.00/5 (Submit Your Rating)

CaliforniA

SUMMARY:

  • Around 8+ years of Experience in IT Industry in Application Development and Data Analytics using various languages and 3 years of experience in Big Data Hadoop Eco - System, developing java code and Strong in Design, Requirement gathering, Analysis, development, implementation and support of applications in the role of Java Developer and Big Data Hadoop Developer.
  • Proficiency in Big Data & ETL Practices and Technologies like HDFS, YARN, MapReduce, Hive, PIG, HBase, SQOOP, OOZIE, ZOOKEEPER, Flume, Kafka, Impala, Spark.
  • Excellent Hands on Experience in developing Hadoop Architecture in Windows and Linux platforms.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Experience in writing MapReduce programs with custom logics based on the requirement.
  • Experience in writing custom UDFs in hive based on the user requirement.
  • Strong Knowledge of using PIG and Hive for processing and analyzing large volumes of data.
  • Expert in extending core functionality of Hive and PIG by writing the Custom UDF’s using Java, Python based on user requirement.
  • Experienced in Writing Hive queries for data analysis and to process the data for visualization using Tableau and Splunk.
  • Extensive experience with Data Extraction, Transformation, and Loading (ETL) from disparate data sources like Multiple Relational Databases (Oracle,Informatica,SQL SERVER, and DB2), VSAM and Flat Files.
  • Expertise in importing and exporting data using SQOOP from HDFS to various Database Systems (RDBMS, Data Ware Houses, Data lake) and vice-versa .
  • Experience in ETL tools like Talend and in-depth knowledge on Working of Data Ware Houses.
  • Capable of creating Real time data streaming Solutions and Batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Autosys, Control M, Oozie and Zookeeper.
  • Proficient in writing in shell and Perl scripts and proficient in Unix/Linux Commands.
  • Experience in storing, processing unstructured data using NoSQL databases like HBase, MongoDB, and Cassandra.
  • Good Knowledge on AWS Concepts like EC2, S3, and EMR.
  • Experience in performing Unit Test, Regression Test, Functional test and Integration Test and experienced in using MR Unit for Hadoop Testing.
  • Experience in Object Oriented Analysis and Design ( OOAD ) and development of software using Java, Scala, and Python in LINUX/UNIX platform.
  • Load and transform data into HDFS from large set of structured data/Oracle/Sql server Informatica using Talend Big data studio.
  • Migration of Informatica code to Hive/Impala.
  • Optimization and performance tuning of Hive QL, formatting table column using Hive functions.
  • Have been involved in designing & creating hive tables to upload data in Hadoop and process like merging, sorting and creating, joining tables.
  • Extensive Experience in middle-tier development using J2EE Technologies and Frameworks like struts, Restful Web services, SOA, spring, Hibernate and Application servers like glassfish and Tomcat.
  • Proficient in working with version controls Like GIT, SVN and with Bug Tracking Tool like JIRA.

TECHNICAL SKILLS:

Big Data Eco Components: HDFS, MapReduce, Yarn, Hive, SQOOP, Pig, HBASE, Zookeeper, Oozie, Impala, Flume, Kafka, Spark,Talend

Programming Languages: Java, C, C++, Python, PL/SQL, Scala, shell scripting

J2EE Components: JDBC, JSP, Servlets, EJBs, and Design Patterns.

Frameworks: Spring, Hibernate, Struts, Restful and SOAP

Databases: Oracle MySQL, SQL Server, Cassandra.

Application Servers: Tomcat, Glass Fish, Web Sphere and Web Logic

Build Tools and Central repositories: Maven, Ant, GIT and SVN.

IDEs: Eclipse, NetBeans, Sublime

Methodology: Agile and Waterfall

Environment: Windows and Linux

PROFESSIONAL EXPERIENCE:

Confidential, California

Hadoop Spark Developer.

Responsibilities:

  • Performed data graduation from traditional data warehouse (Teradata) to Hadoop Data Lake
  • Implemented data ingestion from multiple sources like Teradata, Oracle into Hadoop using SQOOP.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging the data in HBase/HDFS for further Analysis.
  • Collected the logs data from web servers and integrated into HBase using Flume.
  • Developed Map Reduce programs that filter un-necessary records and find out unique records based on different criteria.
  • Responsible for performing extensive data validation using Hive.
  • Experienced in implementing POC's to migrate iterative map reduce programs into Spark transformations using Scala.
  • Push data as delimited files into HDFS using Talend Big data studio.
  • Usage of different Talend Hadoop Component like Hive, Pig, Spark.
  • Loading data into parquet files by applying transformation using Impala
  • Implemented and tested analytical solutions in Hadoop
  • Coordinated with ETL developers for preparation of hive and pig scripts.
  • Utilized SQL scripts for supporting existing applications.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe's like REGEX, JSON and Avro.
  • Moving the data from Oracle, MS SQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
  • Migrated complex Map reduce programs into Spark RDD transformations, actions.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN using Scala.
  • Implemented Spark Core in Scala to process data in memory.
  • Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
  • Importing and exporting data into HDFS, Hive and HBase using SQOOP.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Optimized MapReduce jobs to use HDFS efficiently using various compression Mechanisms.
  • Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.
  • Used Kafka, Flume for building robust and fault tolerant data Ingestion pipeline for transporting streaming web log data into HDFS.
  • Developed a data pipeline using Kafka and storm to store data in HDFS.
  • Used Maven to Build and Deploy Jar’s for MapReduce, Hive and PIG UDF’s.
  • Developed Hive QL to process the data and to generate the Reports for visualizing.

Environment: Hadoop, MapR, MapReduce, HDFS, Hive, Pig, Zookeeper, pySpark, Spark SQL, Spark Streaming, Scala,Python, impala, EDW, Maven, Jenkins, Sqoop, Oozie, Kafka, Teradata.

Confidential, NJ

Hadoop Developer

Responsibilities:

  • Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Hive and produce summary results from Hadoop to downstream systems.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Implemented flume (Multiplexing) to stream data from upstream pipes in to HDFS.
  • Used SQOOP widely to import data from various RDBMS (DB2) into HDFS and to Move data between MongoDB and HDFS.
  • Involved in Hive partitioning, Bucketing, and performing different types of joins on Hive table and implementing serde’s like RegEx.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL) and Hive UDF’s in python.
  • Created scalable and High Performances Rest web services for data tracking.
  • Applied Hive quires to perform analysis of vast data on HBASE using Storage Handler to meet the business requirements.
  • Involve in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Worked on NoSQL databases like HBase, integration with written storm topology to accept inputs from Kafka producer.
  • Planned, Implemented and Managed Splunk for log Management and analytics.
  • Implemented Scala jobs to integrate the real-time data coming from various queues messaging to parse it.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations.
  • Load and Transform Large Sets of Structured and semi-structured data
  • Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using OOZIE.
  • Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
  • Involved in writing Unix/Linux Shell scripting for scheduling jobs and for writing Pig scripts and Hive QL.
  • Cluster co-ordination services through Zookeeper.
  • Involved in upgrading clusters to Cloudera Distributed versions.
  • Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
  • Automated Build upon check in with Jenkins CI (continuous Integration)
  • Implemented test Scripts to support test driven development and continuous integration.
  • Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and Troubleshooting.

Environment: Hadoop, MapReduce, HDFS, SQOOP, Flume, Kafka, LINUX, OOZIE, Python, Splunk, Pig, Scala, ETL, MySQL, JIRA, Hive, Jenkins, HBASE, DB2, MongoDB, Cloudera Hadoop Cluster.

Confidential, MI

Hadoop Admin/Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in start to end process of Hadoop Cluster setup including installation, configuration and monitoring the Hadoop Cluster.
  • Administered Cluster Maintenance, commissioning and decommissioning data nodes, Cluster monitoring troubleshooting.
  • Performed Adding/Removing nodes to an existing Hadoop cluster.
  • Implemented Backup configurations and recoveries from a Name Node failure.
  • Monitored systems and services, architecture design and implementation of Hadoop Deployment, configuration management backup and disaster recovery systems and procedures.
  • Implemented multiple MapReduce programs in Java for Data Analysis.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the Enterprise Data Ware House (EDW).
  • Created Hive Queries for the Market analysts to analyze the emerging data and comparing it with fresh data with EDW references tables.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Designed and presented plan for POC on impala.
  • Involved in migrating Hive QL into Impala to minimize query response time.
  • Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
  • Worked on Sequence files, ORC, RC files, Map side joins, bucketing, partitioning for Hive.
  • Performance enhancement and storage improvement.
  • Performed extensive Data Mining applications using HIVE.
  • SQOOP jobs, PIG and Hive scripts were created for data ingestion from relational databases (MySQL) to compare with historical data.
  • Worked on Storm real time processing bolts which save data to SOLR and HBase.
  • POC for Enabling members and suspect search using SOLR.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
  • Maintained and managed code using GIT and used JIRA for BUG Tracking
  • Performed complex Linux administrative activities as we created, maintained and updated Linux shell scripts
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, STORM, SOLR, Flume, JIRA, Control M, Java, Linux, Maven, Zookeeper, EDW, GIT, ETL, Tableau, Cloudera, MySQL.

Confidential, Dallas, TX

Java Hadoop Developer

Responsibilities:

  • Involved in gathering requirements and converting the requirement into technical specifications.
  • Implemented various OOP concepts and design patterns such as IOC (DI), Singleton, DAO, and prototype.
  • Designing and implementing Spring UI Layer for the Application using the Spring MVC and Java Scripts.
  • Involved in Developing Restful web services, Deployment configurations and testing Using Jersey.
  • Involved in setting up and Monitoring the Hadoop Cluster along with Administrator.
  • Responsible for managing data coming from different sources
  • Involved in HDFS maintenance and loading of structured and unstructured data into HDFS.
  • Developed and supported MapReduce programs to perform data filtering for unstructured data and jobs that’s are running on Hadoop Clustering.
  • Implemented the import and export of Data using SQOOP between MySQL to HDFS on regular basis.
  • Used Flume to load the data from different sources like file systems, servers into HDFS.
  • Created partitioned Hive tables and wrote Hive queries for Data Analysis to meet the Business requirements.
  • Designed and Developed UDF’s to extend the functionality in both PIG and Hive.
  • Developed scripts and batch jobs to schedule various Hadoop Programs.
  • Developed Hibernate with Spring Integration as the data abstraction to interact with the database.
  • Configured Apache HTTP server and Apache Tomcat Server.
  • Designed and Maintained Control M work flow to manage the flow of jobs in the cluster.
  • Involved in Unit Testing and Developed Junit Test cases for unit testing and used various mock up frameworks like mock it Rest client UI.
  • Actively updated the upper management with Daily updates on the progress of projects that include the classifications levels that were achieved on the data.

Environment: Java, Hadoop, HDFS, MapReduce, PIG, Hive, SQOOP, Control M, Linux, MySQL, J2EE, spring, Spring MVC, Hibernate, SQL, Restful Web Services, Apache Tomcat, Junit, Maven, HTML, JSP.

Confidential

Java developer

Responsibilities:

  • Extensively worked on Struts Framework.
  • Associated in designing application using MVC design pattern.
  • Developed front-end user interface modules by using HTML, XML, Java AWT, and Swing.
  • Front-end validations of user requests carried out using Java Script.
  • Designed and developed the interacting JSPs and Servlets for modules like User Authentication and Summary Display.
  • Used Jconsole for the memory management.
  • Developing Action, Action Form, Front Controller, Singleton Classes, and Transfer Objects (TO), Business Delegates (BD), Session Façade, Data Access Objects (DAO) and business validators.
  • Analyzed, designed, implemented and integrated product in existing application.
  • Wrote Jboss Quartz to schedule jobs.
  • Communicated with the other components using JMS within the system.
  • Designed and Developed Web Services implemented SOA architecture using SOAP and XML for the module and published (exposed) the Web Services.
  • Used JDBC to connect the J2EEserver with the relational database.
  • User input validations done using JavaScript and developed use cases using UML.
  • Extreme programming methodologies for replacing the existing code and testing in J2EE environment.
  • Developed web pages using HTML5, DOM, CSS3, JSON, JavaScript, JQuery and AJAX.
  • Implemented applications using Bootstrap framework.
  • Worked on developing internal customer service representative (CSR's) tools.
  • Redesigned the service plan page to display dynamically service products based on user selection.
  • Created functional requirements document for rental car industry with telematics system capability and use cases integrating Verizon managed certificate services into the Verizon M2M Management Center.
  • Developed java classes for business layer.
  • Developed database objects like tables, views, stored procedures, indexes.
  • Involved in testing and fixing the bugs.

Environment: Java, J2EE, JSP, Servlets, Struts, HTML, Maven, Java Script, JDBC, Oracle (PL/SQL), DAO, Tomcat, JUnit, Eclipse.

We'd love your feedback!