We provide IT Staff Augmentation Services!

Hadoop Spark Developer Resume

0/5 (Submit Your Rating)

Providence, RI

SUMMARY

  • Around 8+ years of overall experience as a Java developer and 4+ years as Data Engineer using Hadoop and related Big Data Technologies.
  • Expertise on using Hadoop ecosystem components like Spark, Map Reduce, HDFS, Pig, Hive, Impala, Yarn, Sqoop, Kafka, Oozie and Ambari.
  • Strong understanding of Hadoop Fundamentals and distributed computing methodologies.
  • Strong experience working with Spark framework and building Scala based spark applications.
  • Used Core Spark, Spark Data frames, Spark - SQL and Spark MLlib apis effectively.
  • Strong experience working with Kafka and Spark Streaming.
  • Written Kafka producers for streaming millions of events in to Kafka topics.
  • Expertise in working with Hive and using partitioning and bucketing, writing and optimizing Hive QL queries.
  • Strong experience developing custom UDF’s and using custom SerDes in Hive.
  • Expertise in creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in Hive QL, data transformation and file processing using Pig Latin Scripts.
  • Handled importing/exporting large data sets using Sqoop from Relational databases to HDFS and vice-versa.
  • Experience in developing complex Map Reduce Programs.
  • Experience working with AWS EMR clusters, Cloudera, Microsoft Azure and Horton works Distributions.
  • Strong experience designing, developing, troubleshooting and optimizing end-to-end data pipelines.
  • Good experience working with data scientists to productionalize machine learning models and preparing the feature datasets.
  • Knowledge on setting up workflows using Apache Oozie workflow engine for managing and scheduling Hadoop jobs using Oozie-Coordinator.
  • Experience in working with NoSQL database like HBase and Cassandra.
  • Excellent understanding of Hadoop Architecture and hands-on experience in writing MapReduce programs
  • Experience in integrating Pig with Hive and Hbase using HCatalog.
  • Good Knowledge on HDFS and Yarn services which includes Resource Manager, Node Manager, Name Node and Data Node.
  • Good Knowledge in using IDE like Eclipse, NetBeans, IntelliJ IDEA.
  • Good Knowledge on Hue an open source web interface, Ambari(Hortonworks), Microsoft Azure and Cloudera Distribution Including Apache Hadoop(CDH) for analyzing data with Apache Hadoop.
  • Good Knowledge in importing streaming data into HDFS using Flume.
  • Good Knowledge on writing custom UDFs for Pig and Hive for generating require results.
  • Good Knowledge on ETL tools like Talend.
  • Good noledge on integrating Talend with Hadoop.
  • Experience in using Tableau.
  • Knowledge in development process using Agile methodology.

TECHNICAL SKILLS

BigData EcoSystem: Hadoop, HDFS, MapReduce, Hive, Pig, Impala, HBase, Sqoop, NoSQL (HBase, Cassandra), Spark, Spark SQL, Spark Streaming, Zookeeper, Oozie, Kafka, Flume, Hue, Cloudera Manager, Amazon AWS, Horton work clusters, Microsoft Azure

Java/J2EE Web Technologies: J2EE, JMS, JSF, Servlets, HTML, CSS, XML, XHTML, AJAX, Angular JS, JSP, JSTL

Languages: C, C++, Core Java, Shell Scripting, PL/SQL, Python, Pig Latin, Scala

Scripting Languages: JavaScript and UNIX Shell Scripting, Python

Operating system: Windows, MacOS, Linux and Unix

Design: UML, Rational Rose, Microsoft Visio, E-R Modelling

DBMS / RDBMS: Oracle 11g/10g/9i, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, RDBMS, MongoDB, Cassandra, HBase

IDE and Build Tools: Eclipse, NetBeans, Microsoft Visual Studio, Ant, Maven, JIRA, Confluence

Version Control: SVN, CVS, GIT

Web Services: SOAP, RESTful, JAX-WS

Web Servers: Web Logic, Web Sphere, Apache Tomcat, Jetty

PROFESSIONAL EXPERIENCE

Hadoop Spark Developer

Confidential, Providence, RI

Responsibilities:

  • Worked on building end-to-end Scala based spark applications for cleansing, auditing and transforming raw data feeds from multiple report suites.
  • Developed custom UDF’s in scala to be used in spark-sql.
  • Improved the performance of existing spark applications.
  • Extensively worked on HIVE, created numerous Internal and external tables as part of the requirements.
  • Written custom UDF’s in HIVE according to business requirements.
  • Hands on experience in data loading techniques like Sqoop.
  • Experience in using Oozie for workflow design and Oozie coordinator for scheduling workflows.
  • Used Spark Data frames, Spark-SQL, Spark MLlib extensively.
  • Create batch and real time pipelines using Spark as the main processing framework.
  • Integrated Kafka with Spark Streaming for real time data processing.
  • Work closely with business, transforming business requirements to technical requirements.
  • Hands on experience in working with AWS Cloud Services like EMR, S3 and Redshift.
  • Been part of Design Reviews & Daily Project Scrums.
  • Successfully loaded files to AWS S3 from Teradata, and loaded from AWS S3 to Redshift.
  • Knowledge in installing and configuring various services on the Hadoop cluster.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
  • Used flume, sqoop, hadoop, Apache Spark and oozie for building data pipeline.
  • Contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, NoSQL to Apache Kafka or Apache Spark cluster.
  • Create and develop an End to End Data Ingestion Hadoop pipeline by ingesting sql server raw data into S3 and processed the data using the Spark Programming. Processed data finally pushed to Redshift for RI reports.
  • Develop a Spark Streaming pipe line which ingests Activity data and Email Delivery Events into AWS S3 using Kinesis. Processed the data using the Spark programing and store in AWS S3 bucket, Redshift.
  • Implemented the Data Bricks API in Scala program to push the processed data to Redshift DB. Redshift is columnar and compressed storage, scale linearly and seamlessly.
  • Implemented the AWS S3 API for accessing S3 buckets and data for data processing and developed custom aggregate functions usingSparkSQL and performed interactive querying.
  • Developing distributed computing Big Data applications using Open Source frameworks like Apache Spark, Apex, Flink, Storm, NIFI and Kafka
  • Worked on Spark SQL and Spark Streaming.
  • ImplementedSparkSQL to access hive tables intoSpark for faster processing of data.

Environment: Cloudera Distribution, Microsoft Azure, AWS EMR, Yarn, Spark, Hive, Sqoop, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Shell Scripting.

Hadoop Spark Developer

Confidential, Plano TX

Responsibilities:

  • Involved in loading and transforming large sets of Structured and Semi-Structured data and analyzed them by running Hive queries and Pig scripts.
  • Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from multiple data sources to make it suitable for ingestion into Hive schema for analysis.
  • Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, and JSON.
  • Experience in hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
  • Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
  • Performed maintenance, monitoring, deployments, and upgrades across infrastructure dat supports all our Hadoop clusters.
  • Optimized HIVE analytics SQL queries, created tables/views, written custom UDFs and Hive based exception processing.
  • Involved in transforming the relational database tables to HDFS and Hive tables using Sqoop and vice versa.
  • Stored the processed data by using low level Java API’s to ingest data directly to HBase and HDFS.
  • Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
  • Experience in managing and reviewingHadooplog files.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
  • Worked on various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to implement and make use of those.
  • Used Spark SQL and Data Frame API extensively to build spark applications.
  • Protect the cluster with VPC’s and security group setting to make sure only required firewall access is provided. Access the resources by users using the AWS Roles.
  • Used spark engine Spark SQL for data analysis and given to the data scientists for further analysis.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Apache Spark written in Scala
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Migrating data from Apache Spark-RDD into HDFS and NoSQL like Cassandra/Hbase.
  • Worked on reading multiple data formats on HDFS using Py Apache Spark
  • Performed streaming data ingestion using Kafka to the spark distribution environment.
  • Built a prototype for real time analysis using Spark streaming and Kafka.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc.

Environment: Hadoop, AWS, Spark, Scala 1.5.2, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Microsoft Azure, Pig, Sqoop, Oozie, Zookeeper, Storm, PL/SQL, MySQL, NoSQL, Elastic Search, Oozie, HBase. UNIX, SDLC

Big data Hadoop Developer

Confidential, Oklahoma City, OK

Responsibilities:

  • Developed complex MapReduce jobs in Java to perform data extraction, aggregation and transformation
  • Load the data into HDFS from different Data sources like Oracle, DB2 using Sqoop and load into Hive tables.
  • Analyzed big data sets by running Hive queries and Pig scripts.
  • Integrated the hive warehouse with HBase for information sharing among teams.
  • Developed the Sqoop scripts for the interaction between Pig and MySQL Database.
  • Worked on Static and Dynamic partitioning and Bucketing in Hive.
  • Scripted complex Hive QL queries on Hive tables for analytical functions.
  • Developed complex Hive UDFs to work with sequence files.
  • Designed and developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map Reduce outputs.
  • Created dashboards in Tableau to create meaningful metrics for decision making.
  • Performed rule checks on multiple file formats like XML, JSON, CSV and compressed file formats.
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Used storage format like AVRO to access multiple columnar data quickly in complex queries.
  • Implemented Counters for diagnosing problem in queries and for quality control and application-level statistics.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • ImplementedLog4jto trace logs and to track information.
  • Developed some helper class for abstracting Cassandra cluster connection act as core toolkit.
  • Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: HDFS, MapReduce, Cassandra, Hive, Pig, Sqoop, Tableau, NoSQL, Shell Scripting, Maven, Git, HDP Distribution, Eclipse, Log4j, JUnit, Linux.

Hadoop Developer

Confidential, Pleasanton, CA

Responsibilities:

  • Worked on analyzing, writing Hadoop Map Reduce jobs using Java API, Pig and Hive.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using LZO, Snappy, Bzip etc.
  • Analyze large and critical datasets using Cloudera, HDFS, Hbase, MapReduce, Hive, Hive UDF, Pig, Sqoop, and Zookeeper.
  • Used Scoop to store the data into HBase and Hive.
  • Worked on installing cluster, commissioning & decommissioning of Data Node, Name Node high availability, capacity planning, and slots configuration.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL.
  • Used Pig to parse the data and Store in Avro format.
  • Stored the data in tabular formats using Hive tables and Hive SerDes.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with NoSQL databases like Hbase for creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Implemented a script to transmit information from Oracle to Hbase using Sqoop.
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Fine-tuned Pig queries for better performance.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop, MapReduce, HDFS, Yarn, Sqoop, Oozie, Pig, Hive, Hbase, Java, Eclipse, UNIX shell scripting, python, Horton works.

Java Developer

Confidential, Fargo, ND

Responsibilities:

  • Involving in Analysis, Design, Implementation and Bug Fixing Activities.
  • Involving in Functional & Technical Specification documents review.
  • Created and configured domains in production, development and testing environments using configuration wizard.
  • Involved in creating and configuring the clusters in production environment and deploying the applications on clusters.
  • Deployed and tested the application using Tomcat web server.
  • Analysis of the specifications provided by the clients.
  • Involved to Design of the Application.
  • Ability to understand Functional Requirements and Design Documents.
  • Developed Use Case Diagrams, Class Diagrams, Sequence Diagram, Data Flow Diagram
  • Coordinated with other functional consultants.
  • Web related development with JSP, AJAX, HTML, XML, XSLT, and CSS.
  • Create and enhance the stored procedures, PL/SQL, SQL for Oracle 9i RDBMS.
  • Designed and implemented a generic parser framework using SAX parser to parse XML documents which stores SQL.
  • Deployed the application on Web Logic Application Server 9.0.
  • Extensively used UNIX /FTP for shell Scripting and pulling the Logs from the Server.
  • Provided further Maintenance and support, dis involves working with the Client and solving their problems which include major Bug fixing.

Environment: Java 1.4, Web logic Server 9.0, Oracle 10g, Web services Monitoring, Web Drive, UNIX/LINUX, Web Logic Server, JavaScript, HTML, CSS, XML.

Java / J2EE Developer

Confidential

Responsibilities:

  • Developed the application using Struts Framework dat leverages classical Model View Layer (MVC) Architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams were used
  • Gatheird business requirements and wrote functional specifications and detailed design documents
  • Extensively used Core Java, Servlets, JSP and XML
  • Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database
  • Implemented Enterprise Logging service using JMS and apache CXF.
  • Developed Unit Test Cases, and used JUNIT for unit testing of the application
  • Implemented Framework Component to consume ELS service.
  • Implemented JMS producer and Consumer using Mule ESB.
  • Wrote SQL queries, stored procedures, and triggers to perform back-end database operations
  • Designed Low Level design documents for ELS Service.
  • Developed SQL stored procedures and prepared statements for updating and accessing data from database.
  • Development carried out under Eclipse Integrated Development Environment (IDE).
  • Used JBoss for deploying various components of application.
  • Involved in Unit testing, Integration testing and User Acceptance testing.
  • Utilizes Java and SQL day to day to debug and fix issues with client processes.

Environment: Java, spring core, JBoss, JUNIT, JMS, JDK, SVN, Maven, Servlets, JSP and XMLXML.

We'd love your feedback!