We provide IT Staff Augmentation Services!

Big Data / Spark Developer Resume

5.00/5 (Submit Your Rating)

Reston, VA

PROFESSIONAL SUMMARY:

  • Above 5+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoopsolutions.
  • Experience in working in environments using Agile (SCRUM), RUP and Test-Driven development methodologies.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing.
  • Extensive experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive, Impala & Spark.
  • Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
  • Experience in analysing data using HIVEQL, PIG Latin and custom Map Reduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, Java Servlets, Struts, and Java database Connectivity (JDBC) technologies.
  • Good understanding in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Implemented Service Oriented Architecture (SOA) using Web Services and JMS (Java Messaging Service).
  • Implemented J2EE Design Patterns such as MVC, Session Façade, DAO, DTO, Singleton Pattern, Front Controller and Business Delegate.
  • Experienced in developing web services with XML based protocols such as SOAP, Axis, UDDI and WSDL.
  • Experienced in MVC (Model View Controller) architecture and various J2EE design patterns like singleton and factory design patterns.
  • Extensive experience in loading and analyzing large datasets with Hadoop framework (Map Reduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala), NoSQL databases like Mongo DB, HBase, Cassandra.
  • Experience on Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
  • Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Strong knowledge in SQL and PL/SQL to write Stored Procedures and Functions and writing unit test cases using JUnit.
  • Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into data Warehouse and Data Mart.
  • Strong experience working with databases like Oracle 12g, SQL Server 2010 and MySQL.
  • Hands on experience on the entire latest UI stack including HTML, CSS, mobile friendly, responsive design, user-centric design etc.
  • Experience in developing web-based enterprise applications using Java, J2EE, Servlets, JSP, EJB, JDBC, Hibernate, Spring IOC, Spring AOP, Spring MVC, Spring Web Flow, Spring Boot, Spring Security, Spring Batch, Spring Integration, Web Services (SOAP and REST) and ORM frameworks like Hibernate.
  • Strong knowledge on Hadoopeco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
  • Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
  • Expertise in JavaScript, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls.
  • Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
  • Expertise in using XML related technologies such as XML, DTD, XSD, XPATH, XSLT, DOM, SAX, JAXP, JSON and JAXB.
  • Experience in using ANT and Maven for building and deploying the projects in servers and also using Junit and log4j for debugging.

PROFESSIONAL EXPERIENCE:

Confidential, Reston, VA

Big Data / Spark Developer

Responsibilities:

  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Involved in importing structured and unstructured datainto HDFS.
  • Responsible for fetching real-time datausing Kafka and processing using Spark and Scala.
  • Worked on Kafka to import real-time weblogs and ingested the datato Spark Streaming.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
  • Migrated Map Reduce programs into Spark transformations using Spark and Scala.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into a spark for faster processing of data .
  • Loaded the data into Spark RDD and do in-memory dataComputation to generate the Output response.
  • Implemented dataquality checks using Spark Streaming and arranged passable and bad flags on the data .
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Involved in dataQuerying and Summarization using Hive and Pig and created UDF's, UDAF's and UDTF's.
  • Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
  • Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
  • Developed traits and case classes etc. in Scala.
  • Developed Spark scripts using Scala shell commands as per the business requirement.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Experienced in loading the real-time datato the NoSQL database like Cassandra.
  • Well versed in using dataManipulations, Compactions, in Cassandra.
  • Experience in retrieving the datapresent in Cassandra cluster by running queries in CQL (Cassandra Query Language).
  • Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
  • Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers.
  • Configured workflows that involve Hadoop actions using Oozie.
  • Used Python for pattern matching in build logs to format warnings and errors.
  • Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cassandra, Cloudera, Oracle 10g, Linux.

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Participated in the complete Software Development Life Cycle including Requirement, Analysis, Design, Implementation, Testing and Maintenance.
  • Worked on Hadoop cluster, which ranged from 30 nodes in development stage, 40 nodes in pre-production and 140 nodes in production.
  • Responsible to manage data coming from different sources and importing structured and unstructured data.
  • Implemented complex Map Reduce programs to perform joins on the Map side using distributed cache in java.
  • Developed Map Reduce programs and Hive queries to analyze shipping pattern and customer satisfaction index over the history of data.
  • Experience in Writing PIG User Define Function and Hive UDFS.
  • Pig Scripts are utilized the Sequence File and HCatalog for better performance.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Created Map Reduce programs to handle semi/unstructured data like xml, Json, Avro data files and sequence files for log files.
  • Used SQOOP to import the data from RDBMS to HDFS to achieve the reliability of data.
  • Implemented POC for using APACHEIMPALA for data processing on top of HIVE
  • Responsible for managing and reviewing Hadoop log files. Designed and developed data management system using MySQL.
  • Developed pig scripts to pro Used Pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
  • Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Supported Map Reduce Programs running on the cluster.
  • Installed and configured Proof of Concepts (POC) environments for Map Reduce, Hive, Oozie, Flume, HBase and other major components of Hadoop distributed system.
  • We used flume to transport the large amounts of streaming data into HBase.
  • Developed Map Reduce programs in Java for data analysis and data cleaning.

Environment: Hadoop 1.0.0 and Hadoop 2.0.0, HDFS, Map Reduce, SQOOP, Hive, Pig, OozieHBase, Java, Flume 1.2.0, Eclipse IDE.CDH3

Confidential, Pittsburgh, PA

Hadoop Developer

Responsibilities:

  • Worked on Spark SQL to handle structured data in Hive.
  • Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Worked on complex MapReduce program to analyses data that exists on the cluster.
  • Analyzed substantial data sets by running Hive queries and Pig scripts.
  • Written Hive UDFs to sort Structure fields and return complex data type.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
  • Creating files and tuned the SQL queries in Hive utilizing HUE.
  • Involved in collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
  • Created the Hive external tables using Accumulo connector.
  • Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
  • Created custom SOLR Query segments to optimize ideal search matching.
  • Developed Spark scripts by using Python shell commands.
  • Stored the processed results In Data Warehouse, and maintaining data using Hive.
  • Worked with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Worked with NoSQL databases like MongoDB in making MongoDB tables to load expansive arrangements of semi structured data.
  • Developed Spark scripts by using Python shell commands as per the requirement.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.

Environment: HDFS, MapReduce, Storm, Hive, Pig, Sqoop, MongoDB, Apache Spark, Python, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, SOLR, GIT, Maven.

We'd love your feedback!