We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

5.00/5 (Submit Your Rating)

Malvern, PA

EXPERIENCE SUMMARY:

  • HadoopDeveloper with 8 years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
  • Have 4+ years of comprehensive experience in Big Data processing using Hadoopand its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark, and HBase).
  • Efficient in writing MapReduce Programs and using ApacheHadoopAPI for analyzing the structured and unstructured data.
  • Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
  • Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
  • Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
  • Hands-on experience in managing and reviewingHadooplogs and Good knowledge about YARN configuration.
  • Expertise in writingHadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Extending Hive and Pig core functionality by writing custom UDFs and Experience with Hortonworks and ambari.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Extensive Experience on importing and exporting data using Flume and Kafka.
  • Experience with Consumption the data from Kafka queue using spark.
  • Good working knowledge on NoSQL databases such as Hbase, MongoDB and Cassandra.
  • Used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
  • Worked on installing and configuring Hadoopcluster for distributions like Cloudera Distribution.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Experience in developing solutions to analyze large data sets efficiently.
  • Good working experience on Spark (spark streaming, spark SQL), Scala andKafka.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive.
  • Maintained list of source systems and data copies, tools used in data ingestion, and landing location in Hadoop.
  • Wrote shell scripts that ran multiple Hive jobs to automate different hive tables incrementally used to generate different reports using Tableau for the Business use.
  • Developed various shell scripts and python scripts to address various production issues.
  • Developed and designed automation framework using Python and Shell scripting.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
  • Good Knowledge of data compression formats like Snappy, Avro.
  • Experienced in Database development, ETL, OLAP and OLTP.
  • Dealt with huge transaction volumes while interfacing the front end application written in Java, JSP, Struts, Hibernate, SOAP Web service and with Tomcat Web server.
  • Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
  • Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).

TECHNICAL SKILLS:

Hadoop Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Zookeeper, Flume, sparksql and MongoDB

Distributed File Systems: Apache Hadoop HDFS

Hadoop Distributions: Amazon AWS/EMR, Apache Cloudera, Hortonworks and MapR

Operating System: Windows, Linux, Unix.

Languages: Java, J2EE, C, SQL, PL/SQL, PYTHON

NoSQL Databases: Cassandra, Hbase, MongoDB

Databases: Oracle, SQL Server and MySQL

Search Platforms: Apache Solr

Cloud platforms: Amazon AWS, OpenStack

IDE: IBM RAD, Eclipse.

Tools: TOAD, SQL Developer, ANT, Log4J

Application Servers: JBoss, Tomcat, WebLogic, Web Sphere

ETL: Talend ETL, Talend Studio

Frameworks: Hibernate, Spring, Struts and JMS

PROFESSIONAL EXPERIENCE:

Confidential, Malvern, PA

Sr. Hadoop/Spark Developer

Responsibilities:

  • Responsible for monitoring Cluster using Cloudera Manager.
  • Big data processing using Scala code, AWS, and Redshift.
  • Hands on experience on SPARK. Creating the Data Frames handle in SPARK with Scala.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Configured different topologies for Spark cluster and deployed them on regular basis.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Developed Spark streaming platform and provide analytics based on the Configurations in Zookeeper.
  • Performed data validity checks on the imported data using Spark in Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Designed and built the reporting application that uses the Spark SQL to fetch and generate reports on Cassandra table data.
  • Experienced with MPP database and HAWQ database.
  • Developed quality code adhering to Scala coding Standards and best practices.
  • Implementedthe design patternsin Scala for the application.
  • Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Java MapReduce, Hive, Sqoop and spark using scala.
  • Experiences in writing and implementing MapReduce jobs.
  • Handled the importing of data from various data sources like Oracle and MySQL using SQOOP performed transformation using Hive and loaded the data into HDFS.
  • Created an XML schema for the Solr search engine based on the Oracle schema and Documentation of the Solr Rest API.
  • Used Solr that supports sharding and replication via the new, cutting edge SolrCloud functionality.
  • Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
  • Experience in NoSQL database Cassandra and Built Kafka Rest API to collect events from front end.
  • Developed a data pipeline using Kafka to store data into HDFS.
  • Queried and analyzed data fromCassandrafor quick searching, sorting and grouping throughCQL.
  • Experience with Amazon AWS for setting up Hadoop cluster.
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
  • Implemented the Talend jobs to extracts the data from different systems.
  • Load and transform data into HDFS from large set of structured data using Talend Big data studio.
  • Push data as delimited files into HDFS using Talend Big data studio.
  • Creating mapping from Source to Target in Talend.
  • Develop, Validate and maintain HiveQL queries.
  • Wrote and implemented Apache PIG scripts to load data from and to store data into Hive
  • Designed Hive tables to load data to and from external files.
  • Used Sqoop widely to import data from various systems into HDFS.
  • Written Sqoop incremental import job to move new/updated info from database to HDFS.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users
  • Using PIG predefined functions to convert the fixed width file to delimited file
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings to track, optimize and tailored features to customer needs.

Environment: Spark, Scala, cloudera, Kafka, Cassandra, Oozie, MapReduce, Hive, PIG, Sqoop, Solr, HDFS, AWS, Talend.

Confidential, Atlanta, GA

Hadoop/Spark Developer

Responsibilities:

  • Used Spark -Streaming APIs to perform necessary transformations.
  • Utilized Apache Hadoop environment by Hortonworks
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Used different Talend Hadoop Component like Hive, Pig, Spark.
  • Extensively worked on the core and SparkSQL modules of Spark.
  • Experienced with Hadoop components like Hadoop common, HDFS and Map Reduce.
  • Expert in PIG, HIVE, Sqoop, Flume, Hbase, Oozie and Zookeeper.
  • Configured various views in Ambari such as Hive view, Tez view, and Yarn Queue manager
  • Utilized Agile Scrum Methodology to help manage and organize a team developers with regular code review sessions.
  • Used Flume to collect the logs data with error messages across the cluster and load log data into HDFS using Flume.
  • Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Used Spark API over HadoopYARN to perform analytics on data in Hive.
  • Implemented Spark Elastic Search stock to collect and analyze the logs produced by the spark cluster.
  • Developed Spark code using Scala and Spark -SQL/Streaming for faster testing and processing of data and developed very quick PoC on Spark in the initial stages of the product.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms inHadoopusing Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
  • Implemented Spark framework using Scala and SparkQL for faster testing and processing of data.
  • Solved performance issuesin Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Used Tableau for data visualization and generating reports and Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Pushed the data to RDBMS Systems for mount location for Tableau to import it for reporting.
  • Generated reports using Tableau for testing by connecting to the external MySQL database using ODBC connector.
  • Created Kafka based messaging system to create events and alters for different systems
  • Responsible for developing data pipelines to load data from servers such as SQL servers using Sqoop along with Kafka.
  • Involved in importing the real time data toHadoopusing Kafka and implemented the Oozie job for daily imports.
  • Import the data from different sources like HDFS/Hbase into Spark RDD developed a data pipeline using Kafka and Storm to store data into HDFS. Performed real time analysis on the incoming data.
  • Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
  • Extracted and updated the data into MONOD USING MongoDB import and export command line utility interface.
  • Implemented POC Spark Cluster on AWS
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.

Environment: Spark, Scala, Talend, Hive, Kafka, Ambari, Pig, HDFC, YARN, Flume, Hbase, Sqoop, Oozie, Zookeeper, Tableau, MongoDB and AWS.

Confidential, Oklahoma City, OK

Hadoop Developer

Responsibilities:

  • Designed and implemented column family schemas of Hive and Hbase within HDFS
  • Assigned Schemas and create Hive tables Installed and configuredHiveand also writtenHive UDFs.
  • Developed efficient pig and hive scripts with joins on datasets using various techniques.
  • WrittenHive queriesfor data analysis to meet the business requirements.
  • Installed and configured Hive and also writtenHive UDFs
  • Installed and configuredFlume Pig Sqoop HBaseon the Hadoop cluster
  • Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop & Flume.
  • Implemented best income logic usingPigscripts.
  • Worked on analysingHadoop clusterand different big data analytic tool includingPig Hbase database andSqoop.
  • Exported the analysed data to the relational databases usingSqoopfor visualization and to generate reports for the BI team.
  • Supported in setting up QA environment and updating configurations for implementing scripts withPigandSqoop.
  • Imported data usingSqoopto load data fromMySQLto HDFS on regular basis.
  • Strong Knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Importing and exporting data intoHDFSandHiveusingSqoop.
  • Created External tables pointing to Hbase to access with huge number of columns
  • Involved in bench marking Hadoop/Hbase cluster file systems various batch jobs and workloads.
  • Optimizing performance of Hbase/Hive jobs and Scheduling all Hbase/hive jobs using Oozie.
  • Utilized Agile Methodologies to manage full life-cycle development of the project
  • Followed Agile methodology, interacted directly with the client provided & receive feedback on the features, suggest/implement optimal solutions, and tailor application to customer needs.
  • Responsibilities include designing and delivering web based J2EE solutions. Used JavaScript for Client Side validations.
  • Implemented Static and Dynamic web pages using JSP, Java Script, CSS.
  • Involved in Requirement analysis, design and provide the estimation.

Environment: HadoopHDFS, Hive, PIG, Hbase, Flume, Sqoop, Oozie, Cloud Era Manager, UNIX Shell Scripting, SQL, Web services, Micro services, J2EE,JAVA MVC, IOC

Confidential, Bangalore, India Feb 2011 - Sept 2012

Role: Java/ETL Developer

Responsibilities:

  • Implemented the new modules based on SpringMVC architecture and Spring Bean Factory using IOC and AOP technologies
  • Involved in complete development of Agile Development Methodology/SCRUM, developed and tested applications during various iterations.
  • Developed web pages to display the account transactions and details pertaining to that account using JSP, Spring Web Flow, AJAX and CSS.
  • Responsible for the configuration of Struts web based application using struts-config.xml and web.xml.
  • Modified Struts configuration files as per application requirements and developed Web services for non-java clients to obtain user information. Created Informatica Data quality plans according to the business requirements to standardize, cleanse data and validate addresses.
  • Integrated data quality plans as a part of ETL processes
  • Debugged existing ETL processes and did performance tuning to fix bugs
  • Prepared ETL standards, Naming conventions and wrote ETL flow documentation for Stage, ODS and Mart.
  • Created mappings using different look-ups like connected, unconnected and Dynamic look-up with different caches such as persistent cache
  • Used Struts Action Servlet, Action Form to design various JSP pages using MVC2 architecture.
  • Analyzed and resolved functional and technical problems regarding Jive software.
  • Followed Agile Methodology (TDD, SCRUM) to satisfy the customers and wrote JUnit test cases for unit testing modules.
  • Involved in setting up Linux servers with JBoss, Apache, JDK 1.6, JIRA and Git.
  • Used Java Persistence API (JPA) which enables enterprises to implement Big Data applications quickly and easily.
  • Involved in implementing source code Branching and performed Check-in/Check-out in Subversion
  • Involved in unit testing and system testing and also responsible for preparing test scripts for the system testing

Environment: Java/J2EE, JSP 2.1, XSL, STRUTS 1, AJAX, Hibernate 2, Tag libraries, Spring Framework, Oracle,CSS, Web Services, Servlets 2.,Linux, Scrum, ETL, Test Driven Development, JUnit, log4j, Ant, Unix, JMS.

Confidential

Jr. Java Developer

Responsibilities:

  • Involved in Design, development of Interactive project.
  • Developed the project using Agile methodology.
  • Implemented Action classes, Form classes and created struts-config files, validation files, tiles definitions, resource bundles for the entire application using Struts Framework.
  • Designed and developed UI components using JavaScript, JSP, JSTL and AJAX.
  • Developed DAO Service layer using Spring and Hibernate.
  • Developed application service components and configured beans using Spring IOC, creation of Hibernate mapping files and generation of database schema.
  • Developed test cases and performed unit testing using JUNIT Test cases.
  • Used Log4j to log the user-friendly log messages to the log files.
  • Involved in performing code reviews before delivering to QA.
  • Developed an API to write XML documents from a database. Utilized XML and XS
  • Transformation for dynamic web-content and database connectivity.
  • Coded different deployment descriptors using XML. Generated Jar files are deployed on Apache Tomcat Server.
  • Involved in the development of presentation layer and GUI framework in JSP and Client Side validations were done using JavaScript. Involved in code reviews and mentored the team in resolving issues.
  • Participated in weekly design reviews and walkthroughs with project manager and development teams.
  • Provide technical guidance to business analysts, gather the requirements and convert them into technical specifications/artifacts fordevelopers to start.
  • Involved in coding areas with object-oriented concepts.
  • Understanding and analyzing the requirements Was involved in improving the existing design by minimizing dependencies between the layers with the help of design patterns

Environment: JDK 1.5, Spring 2.5, Hibernate 2.5, SOAP, JAX-WS, Struts 1.3, Oracle 10g, Linux, BEA Web logic 9.2.

We'd love your feedback!