We provide IT Staff Augmentation Services!

Spark /hadoop Developer Resume

2.00/5 (Submit Your Rating)

Water Loo, WI

SUMMARY:

  • Having8+ years of professional IT experience, including 4+ years of Hadoop/Big Data experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • 4+ years of experience in Hadoop components like MapReduce, Flume, Kafka, Pig, Hive, Spark, HBase, Oozie, Sqoop and Zookeeper.
  • Good understanding of Hadoop Architecture including YARN and various components such as HDFS , Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts .
  • Hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka .
  • Hands on experience in using various Hadoop distributions (Cloudera, Hortonworks, MapR).
  • Good knowledge on creating Data Pipelines in SPARK using SCALA .
  • Experience in developing Spark Programs for Batch and Real-Time Processing. Developed Spark Streaming applications for Real Time Processing.
  • Good knowledge on Spark components like Spark SQL , MLlib, Spark Streaming and GraphX.
  • Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core .
  • Experience in using Spark-SQL with various data sources like JSON, Parquet and Hive.
  • Strong knowledge on implementation of data processing on Spark-Core using SPARK SQL and Spark streaming.
  • Hands on experience in working on Spark-SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Worked on Spark SQl, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Experience in implementing Real-Time event processing and analytics using messaging systems like Spark Streaming .
  • Expertise in integrating the data from multiple data sources using Kafka .
  • Knowledge about unifying data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
  • Kafka Deployment and Integration with Oracle databases.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Kafka .
  • Worked extensively with Hadoop Distributions like Cloudera, Hortonworks. Good knowledge on MAPR distribution & Amazon’s EMR .
  • Experienced in moving data from Hive tables into Cassandra for real time analytics on hive tables and Cassandra Query Language ( CQL ) to perform analytics on time series data.
  • Good Knowledge in custom UDF's in Hive & Pig for data filtering.
  • Expertise in writing Hive and PIG queries for data analysis to meet the business requirements.
  • Hands-on experience in using Impala for data analysis.
  • Hands-on experience in using the data ingestion tools - Sqoop and flume.
  • Experience in importing and exporting data using Sqoop from RDBMS to HDFS and vice-versa.
  • Experience in Apache NIFI which is a Hadoop technology and also Integrating Apache NIFI and Apache Kafka.
  • Hands-on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Worked on NoSQL databases like HBase , Cassandra and MongoDB .
  • Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper .
  • Experience in configuring various topologies in Storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop
  • Experience with web UI development using jQuery, CSS, HTML, HTML5, XHTML, JavaScript.
  • Experience in working with Spring and Hibernates framework from Java.
  • Extensive experience with Databases such as Oracle , Mysql, MS-Sql and PL Sql Script .
  • Experience in using IDEs like Eclipse, NetBeans and Intellij .
  • Experience with web UI development using jQuery, CSS, HTML, HTML5, XHTML, JavaScript.

TECHNICAL SKILLS:

Big Data Space: Hadoop, MapReduce, Pig, Hive, Hbase, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Elastic Search, Solr, MongoDB, Cassandra, Avro, Storm, Parquet, Snappy

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR, Apache EMR

Databases & warehouses: Teradata, SQL Server, My SQL, Oracle

Java Space: Core Java, J2EE, JDBC, JNDI, JSP, EJB, Struts, Spring Boot, REST, SOAP, JMS

Languages: Python, Java, JRuby, SQL, HTML, DHTML, Scala, JavaScript, XML, C/C++

Operating systems: UNIX, LINUX, Mac OS, Windows, Variants

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL, DB2.

Version controls: GIT, SVN, CVS

ETL Tools: Informatica, Talend

PROFESSIONAL EXPERIENCE:

Confidential, Water loo, WI

Spark /Hadoop Developer

Responsibilities:

  • Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Sparkwith Cloudera distribution.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Hands on experience on Cloudera Hue to import data on the GUI.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka.
  • Worked on integrating Apache Kafka with Spark Streaming process to consume data from external REST APIs and run custom functions.
  • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Configured, deployed and maintained multi-node Dev and Tested Kafka Clusters.
  • Implemented real time system with Kafka and Zookeeper .
  • Configured spark streaming data to receive real time data from Kafka and store it in HDFS.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Involved in running Hadoop streaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
  • Configured, supported and maintained all network, firewall, storage, load balancers, operating systems, and software in AWS EC2.
  • Implemented the use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Good knowledge in using Data Manipulations, Tombstones, Compactions in Cassandra. Well experienced in avoiding faulty Writes and Reads in Cassandra.
  • Performed data analysis with Cassandra using Hive External tables.
  • Designed the Column families in Cassandra.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
  • Implemented YARN Capacity Scheduler on various environments and tuned configurations according to the application wise job loads.
  • Configured Continuous Integration system to execute suites of automated test on desired frequencies using Jenkins, Maven & GIT.
  • Involved in loading data from LINUX filesystem to HDFS .
  • Followed Agile Methodologies while working on the project.

Environment: Hadoop, HDFS, Hive, Spark, Cloudera, AWS EC2, S3, ERM, Sqoop, Kafka, Yarn, Shell Scripting, Impala, Scala, Pig, Cassandra, Oozie, Java, JUnit, Agile methods, Linux, MySQL, Elastic Search, Kibana, Teradata.

Confidential, Flowood, MS

Data Engineer

Responsibilities:

  • Worked in Multi Clustered Hadoop Echo-System environment.
  • Created MapReduce programs using Java API that filter un-necessary records and find out unique records based on different criteria.
  • Performed optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
  • Converting the existing relational database model to Hadoop ecosystem.
  • Installed and configured Apache Hadoop, Hive and Pig environment.
  • Worked with Linux systems and RDBMS database on a regular basis so that data can be ingested using Sqoop.
  • Reviewed and managed all log files using HBase.
  • Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Creating Hive tables and working on them using HiveQL.
  • Used Apache Kafka for the Data Ingestion from multiple internal clients.
  • Developed data pipeline using Flume and Spark to store data into HDFS.
  • Big data processing using Spark, AWS, and Redshift.
  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Spark.
  • Involved in performing the Linear Regression using Spark MLlib in Scala.
  • Continuous monitoring and managing the Hadoop cluster through HDP ( Hortonworks Data Platform) .
  • Implemented Frameworks using Java and Python to automate the ingestion flow.
  • Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by using Flume.
  • Implemented data quality checks and transformations using Flume Interceptor.
  • Implemented collections & Aggregate Frameworks in MongoDB.
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
  • Efficiently handled periodic exporting of SQL data into Elastic search.
  • Involved in loading data from UNIX file system and FTP to HDFS.
  • Design and Implementation of Batch jobs using MR2, PIG, Hive , Tez .
  • Used Apache Tez for highly optimized data processing.
  • Developed Hive queries to analyze the output data.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
  • Developed Pig Custom UDF's for custom input formats for performing various levels of optimization.
  • Involved in maintaining the Hadoop clusters using Nagios server.
  • Used Pig to import semi-structured data coming from Avro files to make serialization faster.
  • Configuring high availability multi-core solr servers using replication, request handlers, analyzers and tokenizers.
  • Configured Solr server to index different content types like HTML, PDF, XML, XLS, DOC, DOCX and other types.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Used Spark for fast processing of data in Hive and HDFS .
  • Performed batch processing of data sources using Apache Spark, Elastic search .
  • Used Zookeeper to provide coordination services to the cluster.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with reference tables and historical metrics.
  • Wrote the Shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Worked on Reporting tools like Tableau to connect with Hive for generating daily reports.
  • Utilized Agile Scrum methodology.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Scala, Kafka, Flume, Sqoop, Hortonworks, AWS, Redshift, Oozie, Zookeeper, Elastic Search, Avro, Python, Shell Scripting, SQL Talend, Spark, HBase, MongoDB, Linux, Kafka, Solr, Ambari.

Confidential, Summit, NJ

Java/Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reducejobs in java for data cleaning and preprocessing.
  • Migrated existing SQL queries to HiveQL queries to move to big data analytical platform.
  • Integrated Cassandra file system to Hadoop using Map Reduce to perform analytics on Cassandra data.
  • Installed and configured Cassandra DSE multi-node, multi-data center cluster.
  • Designed and implemented a 24 node Cassandra cluster for single point inventory application.
  • Analyzed the performance of Cassandra cluster using node tool TP stats and CFstats for thread analysis and latency analysis.
  • Implemented Real time analytics on Cassandra data using thrift API.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Load and transform large sets data into HDFS using Hadoop fs commands.
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
  • Implemented UDFS, UDAFS in java and python for hive to process the data that can’t be performed using Hive inbuilt functions.
  • Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate report.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
  • Supported in setting up updating configurations for implementing scripts with Pig and Sqoop.
  • Designed the logical and physical data modeling wrote DML scripts for Oracle 9i database.
  • Used Hibernate ORM framework with Spring framework for data persistence.
  • Wrote test cases in JUnitfor unit testing of classes.
  • Involved in templates and screens in HTML and JavaScript.

Environment:: Java, HDFS, Cassandra, Map Reduce, Sqoop, JUnit, HTML, JavaScript, Hibernate, Spring, Pig, Hive.

Confidential, Mountain View, CA

Hadoop Developer

Responsibilities:

  • Worked on Spark SQl, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Created Sqoop jobs to import the data from DB2 to HDFS.
  • Exported data using Sqoop into HDFS and Hive for report analysis.
  • Used Oozie Workflow engine to run multiple Hive and Sqoop jobs.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
  • Experienced in working with Sqoop Scripts.
  • Experienced in creating hive external tables in HDFS.
  • Created visualized reports using tableau tool for the visualization.
  • Performed source data transformations using Hive .
  • Created Partitions in Hive tables and worked on them using HQL .

Environment: Spring tool suite(STS), Spark, Scala, Sqoop, Bashscript, Bamboo, AWS, Github, Hive, Map-Reduce, DB2, Shell scripting, Oozie, Python.

Confidential

Java Developer

Responsibilities:

  • Participated in all the phases of the Software development life cycle (SDLC) which includes Development, Testing, Implementation and Maintenance.
  • Involved in collecting client requirements and preparing the design documents.
  • Implemented Spring MVC architecture and Spring Bean Factory using IOC, AOP concepts.
  • Developed the JAVA classes to execute the business logic and to collect the input data from the users using JAVA, Oracle.
  • Involved in creation of scripts to create, update and delete data from the tables.
  • Followed Agile Methodology in analyze, define, and document the application which will support functional and business requirements.
  • Wrote JSP using HTML tags for designing UI for different pages.
  • Extensively used OOD concepts in overall design and development of the system.
  • Developed user interface using Spring JSP to simplify the complexities of the application.
  • Responsible for Development, unit testing and implementation of the application.
  • Used Agile methodology to design, develop and deploy the changes.
  • Extensively used tools like AccVerify, check style and Clockworks to check the code.

Environment: Java, JSP, JSP JDBC, HTML, XSL, Spring, CSS, JavaScript, Oracle 8i, XML, WebLogic

Confidential

Java developer

Responsibilities:

  • Good understanding in Install, configure and deploy software by gathering all the requirements needed.
  • Performed a Quality Assurance test.
  • Help Design application development using Spring MVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, JavaScript, JQuery and AJAX.
  • Implemented JavaScript , Shell script , JSP for Front and Server-side validations.
  • Involved in writing SQL queries for fetching data from Oracle database.
  • Developed multi-tiered web - application using J2EE standards.
  • Used JIRA to track bugs.
  • Used Apache Axis to develop web services and SOAP protocol for web services communication.
  • Implemented persistence layer using Spring JDBC to store and update data in database.
  • Used Apache Tomcat application server for deploying and configuring application.
  • Used JUnit to test persistence and service tiers.Involved in unit test case preparation.
  • Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
  • Deployed and built the application using MAVEN.
  • Following AGILE and SCRUM Methodology.
  • Involved in sprint planning, code review, and daily standup meetings to discuss the progress of the application.

Environment : HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Tomcat Server, Hibernate, JDBC, MAVEN, MVC, Agile, Git, JIRA, SVN .

We'd love your feedback!