We provide IT Staff Augmentation Services!

Hadoop Developer / Spark Developer Resume

4.00/5 (Submit Your Rating)

San Francisco, CA

PROFESSIONAL SUMMARY:

  • Over 8 years of professional IT experience including 5 plus years of experience on Big Data, Hadoop Development and Data Analytics, Development and Design of Java based enterprise applications.
  • Very strong knowledge on Hadoop ecosystem components like HDFS, SPARK, HIVE, HBASE, SQOOP, KAFKA, MAPREDUCE, PIG and OOZIE.
  • Strong Knowledge on Architecture of Distributed systems and Parallel processing frameworks.
  • In - depth understanding of internals of MapReduce framework and Spark execution model.
  • Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
  • Experience in different Hadoopdistributions like Cloudera(Cloudera distribution CDH3, 4 and 5), Horton Works Distributions (HDP) and Elastic Mapreduce (EMR).
  • Worked extensively on fine-tuning long running Spark Applications to utilize better parallelism and executor memory for more caching.
  • Strong experience working with both batch and real-time processing using Spark framework.
  • Strong knowledge on performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Used custom serdes like Regex Serde, JSON Serde, CSV Serde etc., in hive to handle mutiple formats of data.
  • Strong experience using different columnar file formats like Avro, RCFile, ORC and Parquet formats.
  • Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
  • Experience in optimizing Map-Reduce algorithms by using Combiners and custom partitioners.
  • Expertise in back-end/server-side java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC)
  • Experience in NoSQL Column-Oriented Databases like HBase, Apache Cassandra, MongoDB and its Integration with Hadoop cluster.
  • Experienced in writing custom Map Reduce programs & UDF's in Java to extend Hive and Pig core functionality.
  • Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
  • Created Talend Mappings to populate the data into dimensions and fact tables.
  • Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance Tuning of mappings.
  • Worked with Sqoop to move (import/export) data from a relational database into Hadoop.
  • Experience working with Hadoop clusters using Cloudera, Amazon AWS and Horton works distributions.
  • Experience in installation, configuration, support and management of a Hadoop Cluster.
  • Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
  • Experienced in using agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
  • Proficient in integrating and configuring the Object-Relation Mapping tool, Hibernate in J2EE applications and other open source frameworks like Struts and Spring.
  • Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
  • Experience in writing test cases in Java Environment using JUnit.
  • Hands on experience in development of logging standards and mechanism based on Log4j
  • Experience in building, deploying and integrating applications with ANT, Maven.
  • Good knowledge in Rest Services,Web Services, SOAP programming, WSDL, and XML parsers like SAX.
  • Flexible, enthusiastic and project oriented team player with excellent communication skills with leadership abilities to develop creative solutions for challenging requirement of client.

TECHNICAL SKILLS:

Languages: Java, Scala, Pyhton, SQL, Pig Latin, HiveQL, Shell Scripting

BigData/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozie

NO SQL Databases: HBase, Cassandra, MongoDB

J2EE/Middleware: J2EE (Servlets 2.4, JSP 2.0, JDBC, JMS)

Database: Microsoft SQL Server, MySQL, Oracle, DB2

Cloud Computing Tools: Amazon AWS

Development Tools: Microsoft SQL Studio, Eclipse, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall

GUI Technologies: HTML, XHTML, CSS, JavaScript, Ajax, AngularJs

Web/App Servers: Web Logic, Web Sphere

Operating Systems: UNIX, Windows, Mac, LINUX

Office Suite: Microsoft Office (Word/Excel/PowerPoint)

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco, CA

Hadoop Developer / Spark Developer

Responsibilities:

  • Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
  • Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the over all processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest apis to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
  • Worked extensively with Sqoop for importing data from Oracle.
  • Experience working for EMR cluster in AWS cloud and working with S3.
  • Involved in creating Hive tables, loading and analyzing data using hive scripts.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.

Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, HBase, Tableau, Oozie, Oracle, Linux

Confidential, Minneapolis, MN

Hadoop/Spark Developer

Responsibilities:

  • Used Cloudera distribution extensively.
  • Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL apis.
  • Developed Spark Programs for Batch processing.
  • Written new spark jobs in Scala to analyze the data of the customers and sales history.
  • Worked on Spark SQL and Spark Streaming.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Kafka to get data from many streaming sources into HDFS.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
  • Created Hive external tables to perform ETL on data that is generated on daily basics.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Written HBase bulk load jobs to load processed data to Hbase tables by converting to HFiles.
  • Performed validation on the data ingested to filter and cleanse the data in Hive.
  • Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
  • Created SQOOP jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
  • Implemented Spark SQL to access hive tables into spark for faster processing of data.
  • Loaded the data into hive tables from spark and used parquet columnar format.
  • Developed oozie workflows to automate and productionize the data pipelines.

Environment: s: Hadoop, Hive, Flume, Shell Scripting, Java, Eclipse, HBase, Kafka, Spark, Spark Streaming, Scala, Oozie, HQL/SQL, Teradata.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Developed Map Reduce programs for data extraction, transformation and aggregation.
  • Monitor and troubleshoot Map Reduce Jobs those are running on the cluster.
  • Implemented solutions for ingesting data from various sources and processing the data utilizing hadoop services like Sqoop, Hive, Pig, Sqoop, HBase, Map reduce, etc.
  • Worked on creating Combiners, Partitioners and Distributed cache to improve the performance of Map Reduce jobs.
  • Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Optimization of Map reduce algorithms using combiners and partitioners to deliver the best results and worked on Application performance optimization for a HDFS cluster.
  • Orchestrated many Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub workflows.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in debugging Map Reduce jobs using MRUnit framework and optimizing Map Reduce jobs.
  • Involved in troubleshooting errors in Shell, Hive and Map Reduce.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Design and implement map reduce jobs to support distributed processing using Map Reduce, Hive and Apache Pig.
  • Created Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Hbase, DB2, Flume, ESP, Oozie, Maven, Unix Shell Scripting.

Confidential, Eden Prairie, MN

Hadoop developer

Responsibilities:

  • Communicating with business customers effectively to gather the required information for the project.
  • Involved in loading data into HDFS from Teradata using Sqoop.
  • Experienced in moving huge amounts of log file data from different servers
  • Worked on implementing complex data transformations using MapReduce framework.
  • Involved in generating structured data through MapReduce jobs and have stored them in Hive tables and have analyzed the results through Hive queries based on the requirements.
  • Worked on performance improvement by implementing Dynamic Partitioning and Buckets in Hive and by designing managed and external tables.
  • Worked on development of PIG Latin scripts and have used ETL tools and Informatica for some pre-aggregations
  • Worked on MapReduce programs to cleanse and pre-process data from various sources.
  • Worked on Sequence files and Avro files in map Reduce programs.
  • Created Hive Generic UDF’s for implementing business logic. And have worked on incremental imports to Hive Tables.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Loaded processed data into HBase tables using HBase Java api calls.

Environment: Hadoop, Cloudera, MapReduce, Hive, Impala, Pig, HBase, Sqoop, Flume, Oozie, Java, Maven, RHEL and UNIX Shell

Confidential, Knoxville, TN

Hadoop Developer

Responsibilities:

  • Collaborated with different teams for Cluster Planning, Hardware requirement, network equipment’s to implement 9 node Hadoop cluster using Cloudera distribution.
  • Involved in implementation and ongoing administration of Hadoop infrastructure.
  • Screening of Hadoop cluster Job performances and Cluster capacity planning.
  • Worked on analyzing Hadoop stack and developed multiple poc’s using MapReduce, Pig, Hive, HBase, Sqoop, Flume.
  • Good understanding of AWS (amazon web services) EC2, RDS & S3
  • Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Resolving tickets submitted by users, troubleshoot the documented errors, resolving the errors.
  • Involved in creating Hive tables, loading and analyzing data using hive queries.
  • Dumped the data from one cluster to another cluster by using DistCp(Distributed copy ).
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre- processing with Pig.
  • Implemented a script to transmit information from Oracle to HBase and Cassandra using Sqoop.
  • Assisted in exporting analyzed data to NoSQL DB's Cassandra and HBase using Sqoop.
  • Worked on tuning the performance of Hive and Pig queries.
  • Performance tuning of Hadoop clusters and Hadoop Map Reduce routines.
  • Manage and review Hadoop log files.
  • Involved in HDFS maintenance and loading of structured and unstructured data from Linux machines, wrote MapReduce jobs using Java API and Pig Latin as well.
  • Monitor Hadoop cluster connectivity and security.
  • Implemented Fair scheduler on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the user.
  • Worked with application teams to install OS, Hadoop updates, patches, versions upgrade as required.
  • Aligning with the system engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
  • Good with Java to write MapReduce business logics, and UDF's for PIG and HIVE

Environment: Cloudera Distributed Hadoop(CDH4), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Oozie, Impala, KafkaConfidential

Java Developer

Responsibilities:

  • Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle
  • Involved in Requirement Analysis, Development and Documentation.
  • Used MVC architecture (Jakarta Struts framework) for Web tier.
  • Participated in developing form-beans and action mappings required for struts implementation and validation framework using struts.
  • Development of front-end screens with JSP Using Eclipse.
  • Involved in Development of Medical Records module. Responsible for development of the functionality using Struts and EJB components.
  • Coding for DAO Objects using JDBC (using DAO pattern).
  • XML and XSDs are used to define data formats.
  • Implemented J2EE design patterns value object singleton, DAO for the presentation tier, business tier and Integration Tier layers of the project.
  • Involved in Bug fixing and functionality enhancements.
  • Designed and developed excellent Logging Mechanism for each order process using Log4j.
  • Involved in writing OracleSQL Queries.
  • Involved in Check-in and Checkout process using CVS.
  • Created SAP Business Objects Reports.
  • Developed additional functionality in the software as per business requirements.
  • Involved in requirement analysis and complete development of client-side code.
  • Followed Sun standard coding and documentation standards.
  • Participated in project planning with business analysts and team members to analyze the Business requirements and translated business requirements into working software.
  • Developed software application modules using disciplined software development process.

Environment: Java, J2EE, JSP, EJB, ANT, STRUTS1.2, Log4j, Weblogic 7.0, JDBC, Eclipse, Windows XP, CVS, Oracle, SAP Business Objects, Netezza

Confidential

Java Developer

Responsibilities:

  • Involved in Presentation Tier Development using JSF Framework and ICE Faces tag Libraries.
  • Involved in business requirement gathering and technical specifications.
  • Implemented J2EE standards, MVC2 architecture using JSF Framework.
  • Implementing Servlets, JSP and Ajax to design the user interface.
  • Extensive experience in building GUI (Graphical User Interface) using JSF and ICE Faces.
  • Developed Rich Enterprise Applications using ICE Faces and Portlets technologies.
  • Experience using ICE Faces Tag Libraries to develop user interface components.
  • All the Business logic in all the modules is written in core Java.
  • Wrote Web Services using SOAP for sending and getting data from the external interface.
  • Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
  • Middleware Services layer is implemented using EJB (Enterprise Java Bean - stateless) in WebSphere environment.
  • Funds Transfers are sent to another application using JMS technology asynchronously.
  • Involved in implementing the JMS (Java messaging service) for asynchronous communication.
  • Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle

Environment: J2EE, EJB, JSF, ICE Faces, EJB, Web Services, XML, XSD, Agile, Microsoft Visio, Clear Case, Oracle 9.i/10.g, Weblogic8.1/10.3,RAD, Log4j,Servlets, JSP, Unix.

We'd love your feedback!