We provide IT Staff Augmentation Services!

Spark Developer Resume

4.00/5 (Submit Your Rating)

Owings Mills, MD

SUMMARY

  • Around 9 years of experience in Design, Analysis and Development of software application using Big Data/ Hadoop, Spark and Java/JEE Technologies.
  • Knowledgein Spark Core, Spark - SQL, Spark Streaming and machine learning using Scala and Python Programming languages.
  • Worked on Open Source Apache Hadoop, Cloudera Enterprise (CDH) and Hortonworks Data Platform (HDP)
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, HBase, Zookeeper, Task Tracker, Name Node, Data Node, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data
  • Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
  • Experience in working with flume to load the log data from multiple sources directly into HDFS.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Good understanding of RDD operations in Apache Spark like Transformations &Actions, Persistence/ Caching, Accumulators, Broadcast Variables, Optimising Broadcasts.
  • Hands on experience in performing aggregations on data using Hive Query Language (HQL).
  • Developed MapReduce programs in java.
  • Good experience in extending the core functionality of Hive and Pig by developing user-defined functions to provide custom capabilities to these languages.
  • Proficient in designing and querying the NoSQL databases like HBase and MongoDB
  • Performed Importing and exporting data into HDFS and Hive using Sqoop
  • Experience in scheduling time driven and data driven Oozieworkflows.
  • Hands on experience in working with input file formats like parquet, json, Avro.
  • Worked on Extraction, Transformation, and Loading (ETL) of data from multiple sources likeFlat files, XML files and Databases.
  • Hands-on experience in J2EE technologies such as Servlets, JSP, EJB, JDBC and developing Web Services providers and consumers using SOAP, REST.
  • Used Agile Development Methodology and Scrum for the development process
  • Good Knowledge in HTML, CSS, JavaScript and web based applications.
  • Excellent analytical, problem solving and interpersonal skills. Ability to learn new concepts fast consistent team player with excellent communication skills.

TECHNICAL SKILLS

Hadoop/Big Data: Hadoop MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Spark, Kafka

Languages and Web Technologies: C, C++, C#, Scala, XML, HTML, CSS, JavaScript, J2EE, Java, Python, JSP

Frameworks: Spring, struts, Hibernate, Servlets

Web Services: REST, SOAP

Databases: MY SQL, Oracle, DB2, MongoDB, Spark-SQL, HBase

Tools: Tableau, Weka

Application servers: Apache Tomcat, WebLogic 8.0

IDE/Modelling Tools: Eclipse, Intellij

Development Methodologies: Agile, Waterfall

Logging tools: Log4j

Operating Systems: Windows 7/8/10, LINUX

PROFESSIONAL EXPERIENCE

Confidential

Spark Developer

Responsibilities:

  • Developed Apache spark jobs using Scala in test environment for faster data processing and used spark SQL for querying.
  • Developed scala code using specific monad pattern for different calculations based on the requirement.
  • Developed and executed shell scripts to automate the jobs
  • Written complex hive queries and automated using Azkaban for analyzing hourly calculations.
  • Analyzed large amounts of data sets using Pig scripts and Hive scripts.
  • Worked with Hue manager for developing hive queries and checking data in both development and production environments.
  • Developed Pig Latin scripts for extracting data.
  • Used Pig for data loading, filtering and storing the data.
  • Worked on DataIntegration from different source systems.
  • Used Robo Mongo for storing the data.
  • Worked on retrieving data from Amazon kinesis streams and Amazon s3.
  • Worked on creating topics in Kafka and integrating kinesis streams to Kafka and storing the data in the HDFS using gobblin.
  • Scheduled/Automate jobs using Azkaban.
  • Developed and executed shell scripts to automate the jobs.
  • Developed python code for creating fields in mongo.
  • Used Jenkins for continuous automation of code.
  • Complete end-to-end design of Apache NiFi to get connected to AWS and store the final output in HDFS.

Environment: HDFS, Hive, Pig, Spark RDD, Spark-SQL, Kafka, Spark, Scala, Python, RoboMongo, Horton works, Intellij, Azkaban, Ambari/Hue, Jenkins, Apache NiFi.

Confidential, Owings Mills, MD

Spark Developer

Responsibilities:

  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Involved in gathering the requirements, designing, development and testing.
  • Developed PIG scripts for source data validation and transformation.
  • Designing and developing tables in HBase and storing aggregating data from Hive.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed Spark core and Spark-SQL scripts using Scala for faster data processing.
  • Involved in code review and bug fixing for improving the performance.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data
  • Optimized Hive queries for performance tuning.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Populated HDFS and Cassandra with huge data using Apache Kafka
  • Basic knowledge on Machine Learning and Predictive Analysis.
  • Data analysis using Spark with Scala.
  • Analyze and report the data using Tableau.
  • Create dashboards in Tableau.

Environment: HDFS, Hive, Pig, Spark RDD, Spark Streaming, Spark-SQL HBase, Sqoop, Oozie, Kafka, Cassandra, Scala, Tableau

Confidential, Owings Mills, MD

Hadoop Developer

Responsibilities:

  • Worked on importing data from various sources and performed transformations using MapReduce, hive to load data into HDFS.
  • Worked on compression mechanisms to optimize MapReduce Jobs.
  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Created scripts to automate the process of Data Ingestion.
  • Performed joins, group by and other operations in MapReduce by using Java and PIG.
  • Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
  • Worked on setting up Pig, Hive and HBase on multiple nodes and developed using Pig, Hive, HBase and MapReduce
  • Worked on the conversion of existing MapReduce batch applications for better performance.
  • Created HBase tables to store variable data formats coming from different portfolios
  • Performed real time analytics on HBase using Java API and Rest API
  • Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables
  • Worked on compression mechanisms to optimize MapReduce Jobs
  • Analyzed the customer behavior by performing click stream analysis and to ingest the data used flume
  • Experienced with working on Avro Data files using Avro Serialization system
  • Implemented business logic by writing UDF's in Java and used various UDF's from Piggybanks and other sources
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team

Environment: HDFS, Hive, MapReduce, Pig, Sqoop, RDBMS, HBase, Java API, REST API, Cloudera, AVRO, Flume.

Confidential, Houston TX

Hadoop Developer

Responsibilities:

  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster
  • Installed and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster
  • Installed Oozie Workflow engine to run multiple Hive and Pig Jobs
  • Developed multiple MapReduce jobs in Java for data cleansing and preprocessing
  • Developed Simple to complex Map/Reduce Jobs using Hive and Pig
  • Involved in loading data from UNIX file system to HDFS
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
  • Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports
  • Responsible for building scalable distributed data solutions using Hadoop
  • Migration of ETL processes from Oracle to Hive to test the easy data manipulation
  • Performed optimization on Pig scripts and Hive queries increase efficiency and add new features to existing code
  • Developed PIG Latin scripts for the analysis of semi structured data
  • Developed Hive queries to process the data
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs
  • Used Sqoop to import data into HDFS and Hive from other data systems
  • Installed Oozie workflow engine to run multiple Hive
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Conducted some unit testing for the development team within the sandbox environment.

Environment: Hadoop Cluster, Hive, Pig, Sqoop, Oozie, Oracle, Cloudera Manager, UNIX, ETL

Confidential, Bluebell, PA

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Good understanding and related experience with Hadoop stack - internals, Hive, Pig and Map/Reduce
  • The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Involved in defining job flows
  • Involved in managing and reviewing Hadoop log files
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources
  • Supported Map Reduce Programs those are running on the cluster
  • Involved in loading data from UNIX file system to HDFS.
  • Responsible to manage data coming from different sources.
  • Installed and configured Hive and developed Hive UDFs to extend core functionality of hive
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.

Environment: Apache Hadoop, HDFS, Map Reduce, Pig, Hive tables, Hive UDFs, UNIX, Java, ETL, Eclipse.

Confidential, New York, NY

Java Developer

Responsibilities:

  • Designed the application using Agile Methodology.
  • Developed Maven based project structure having data layer, ORM, and Web module.
  • Developed MVC framework based website using JSF and spring.
  • Designed and Developed HTML pages and JSP pages.
  • Developed Business components using spring framework and database connections using JDBC.
  • Responsible for creating tables of client's information in and writing Hibernate mapping files to manage one-to-one and one-to-many mapping relationships.
  • Implemented data reading, saving and modification by stored procedures in MySQL database and Hibernate criteria.
  • Developed Graphical User Interfaces by using JSF, JSP, HTML, CSS, and JavaScript.
  • Installation and configuration of Development Environment using Eclipse with WebLogic Applicationserver.
  • On the server side, post the access to the application and provided result on the network using RESTfulweb service.
  • Developed the XMLGateway to help the ordering process system communicate with the Order Execution Tool and different online tools such as Line Qualification, Billing Information and Credit Card Validation Systems.
  • Used NodeJS to develop scalable web application.
  • Development (TDDApproach) environment using Agile methodologies.
  • Used JUnit to test, debugged and implemented the application.
  • Implemented payment gateway using PayPal.
  • Developed build script using MAVEN to build, package, test and deploy application in application server.
  • Auditing tool is implemented by using log4j.

Environment: MVC, HTML, JSP, XML, Maven, JavaScript, Node.js, TDD, Junit, log4j, JDBC, MYSQL, Hibernate, WebLogic, JSF, Spring, Eclipse.

Confidential, Bridgewater, NJ

Java Developer

Responsibilities:

  • Designing the Use Case Diagrams, Class Model, Sequence diagrams, for SDLC process of the application.
  • Implemented GUI pages by using JavaScript, HTML, JSP, CSS, and AJAX.
  • Designed and developed UI components using JSP, JMS, JSTL.
  • Deployed project on Web Sphere application server in Linux environment.
  • Implemented the online application by using Web Services (SOAP), JSP, Servlets, JDBC, and Core Java.
  • Implemented Singleton, DAO Design Patterns, factory design pattern based on the application requirements.
  • Used DOM and SAX parsers to parse the raw XML documents.
  • Tested the web services with SOAP UI tool.
  • Developed back end interfaces using PL/SQL packages, stored procedures, Functions, Procedure, Anonymous PL/SQL programs, Cursor management, Exception Handling in PL/SQL programs.
  • Tuning complex database queries and joining the tables to improve the performance of the application.
  • Used Eclipse as Development IDE for web applications.

Environment: JDBC, HTML, CSS, JSP, AJAX, XML, SOAP, DOM, SAX, PL/SQL, Eclipse, SOAP, Servlets.

Confidential

Java Developer

Responsibilities:

  • Involved in the complete SDLC software development life cycle of the application from requirement gathering and analysis to testing and maintenance.
  • Worked with the business community to define business requirements and analyze the possible technical solutions.
  • Requirement gathering, Business Process flow, Business Process Modeling and Business Analysis.
  • Implemented the User Login logic using Spring MVC framework encouraging application architectures based on the Model View Controller design paradigm
  • Used various Java, J2EE APIs including JDBC, XML, Servlets, and JSP.
  • Generated Hibernate Mapping files and created the data model using mapping files
  • Developed UI using JavaScript, JSP, HTML and CSS for interactive cross browser functionality and complex user interface
  • Developed action classes and form beans and configured the struts-config.xml
  • Provided client-side validations using Struts Validator framework and JavaScript
  • Created business logic using servlets and session beans and deployed them on Apache Tomcat server
  • Created complex SQL Queries, PL/SQL Stored procedures and functions for back end
  • Prepared the functional, design and test case specifications
  • Performed unit testing, system testing and integration testing
  • Used JUnit for unit testing of the application
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
  • Resolved more priority defects as per the schedule

Environment: SDLC, Spring MVC, JSP, Servlets, JavaScript, SQL, HTML, CSS, PL/SQL, Hibernate, Junit.

We'd love your feedback!