We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

Palo Alto, CA

SUMMARY

  • Over 8+ years of experience in various IT sectors such as banking and telecom services, which includes handson experience in Big Data technologies.
  • 4+ years of Big Data Ecosystem experience in ingestion, storage, querying, processing and analysis of big data.
  • In depth understanding/Knowledge of Hadoop Architecture and various components such as HDFS, YARN, Name Node, Data Node, Resource Manager, Node Manager.
  • Experience in building, maintaining multiple Hadoop clusters of different sizes and configuration.
  • Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop. Good knowledge in using job scheduling and monitoring tools like Oozie.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Expertise in Apache Spark Development (Spark SQL, Spark Streaming, MLlib, GraphX, Zeppelin, HDFS, YARN, and NoSQL)
  • Excellent understanding and Knowledge of NOSQL databases like HBase, Cassandra.
  • Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic partitions, Bucketing for best practice and performance improvement.
  • Good experience in implementing Apache Spark or Spark Streaming project, preferably using Scala, and Spark SQL.
  • In depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames and Spark Streaming.
  • Experience in managing Hadoop clusters and services using Cloudera Manager. Proficient in using Cloudera Manager, an end to end tool to manage Hadoop operations.
  • Experienced in Extraction, Transformation, and Loading(ETL) processes based on business need using Oozie workflows to execute multiple java, Hive, Shell and SSH actions.
  • Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology.
  • Experience in database design using PL SQL to write stored Procedures, Functions, Triggers and writing queries for Oracle 10g.
  • Excellent Java development skills using J2EE, Servlets, Junit and familiar with popular frameworks such as Spring, MVC, and AJAX.
  • Knowledge of Software Development Life Cycle(SDLC) and in detailed design documentation.
  • Good Experience in database performance tuning.
  • Hands on experience with Shell Scripting and UNIX.
  • Experienced with distributed message brokers ( such as Kafka).
  • Ability to learn quickly in work environment, excellent written and verbal communication skills, presentation and problem solving skills. Good team player, ability to work in fast paced environment.

TECHNICAL SKILLS

Hadoop Distributions: Apache, Cloudera CDH4 and CDH 5

Big Data Ecosystem: Apache Hadoop (HDFS/Map Reduce), Hive, Pig, Sqoop, Zookeeper, Oozie, Hue, Spark, Spark SQL, Apache Kafka

NoSQL Databases: HBase, Cassandra

Languages: Java/J2EE, SQL, Python, Scala, PL SQL, XML

Databases: Oracle 10g/9i/8i, DB2, MySQL, MS - SQL Server

Development Tools: Eclipse, Rational Rose

Software Engineering: Agile, Scrum Methodology

Version Control Systems: GIT, SVN

Amazon Web Services: EC2, S3, Cloud Watch, EMR, LAMBDA, Simple DB, SQS, SNS

PROFESSIONAL EXPERIENCE

Confidential, Palo Alto CA

Hadoop/Spark Developer

Responsibilities:

  • Involved in analyzing business, system and data mapping requirements.
  • Involved in developing ETL data pipelines for real-time streaming data using Kafka and Spark.
  • Applied transformations on raw input data consumed from Kafka topics and transformed the data into new topics for further processing.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
  • Developed UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
  • Developed a data pipeline using Kafka, HBase, Spark and Hive to ingest, transform and analyzing
  • Developed client server website using JAVA as programming language for clients to access required information.
  • Working Knowledge of using GIT, ANT/ Maven for project dependency/ build / deployment.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS. Big data tool to load the big volume of source files from S3 to Redshift.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards using SPARK STREAMING
  • Experience in accessing Kafka cluster to consume data to Hadoop and analyzing the data by performing Hive queries and running Pig scripts.
  • Create, modify and execute DDL and ETL scripts for De-normalized tables to load data into Hive and AWS Redshift tables.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
  • Analyzed the SQL scripts and designed the solution to implement using Scala
  • Used Zookeeper for various types of centralized configurations. Wrote applications which connect to the Zookeeper client and that also creates Services and Jobs where each job is being assigned to a service for processing.
  • Designed and developed ETL workflow using Oozie and automated them using Autosys.
  • Wrote the Shell Scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.

Environment: Java, Apache Hadoop, Apache Kafka, Apache Zeppelin, AWS, Spark RDD, Scala, HBase, Hive, Pig, Oozie, Redshift, Zookeeper, Cloudera CDH 4/5 Distribution, Spark Streaming, Eclipse, SQL, J2EE.

Confidential, Foster City CA

Hadoop Developer

Responsibilities:

  • Involved in integration of Hadoop cluster with Spark engine to perform BATCH operations.
  • Migrated the data coming from different sources into Spark Cluster through Spark-Submit job.
  • Applied transformations on the data loaded into Spark RDDs and done in memory data computation to generate the output response.
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python (PySpark)
  • Worked on Configuring Zookeeper, Kafka cluster.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Worked with Avro and Parquet file formats and used various compression techniques to leverage the storage in HDFS.
  • Used Spark Streaming to fetch the twitter data with ASU hast tags to perform the sentiment analysis.
  • Worked on POC data movement from Kafka, storm to HDFS, Hive.
  • Used hive to analyze the partitioned data and compute various metrics for reporting.
  • Experience on Amazon Web Services(AWS), Amazon Cloud Services like Elastic Compute Cloud(EC2), Simple Storage Service(S3), Elastic Map Reduce(EMR), Amazon Simple DB, Amazon Cloud Watch, SNS, SQS, LAMBDA.
  • Import the data from different sources like HDFS/HBase into SparkRDD
  • Reduced the latency of spark jobs by tweaking the spark configurations and following other performance and Optimization techniques.
  • Created topics on the Desktop portal using Spark Streaming with Kafka and Zookeeper.
  • Involved in using Apache Splunk add-ons to enhance data ingestion and analysis of log data.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive, Pig.
  • Worked on various production issues during the month end support and provide resolutions without missing any SLA.

Environment: Apache Hadoop, Apache Spark, Scala, AWS, Spark RDD, Hbase, Scala, Apache Splunk, MapReduce, Hive, Pig, Oozie, Zookeeper, Kafka, Spark Streaming, Python, SQL, Linux, Cron jobs.

Confidential, Charlotte NC

Hadoop/Spark Developer

Responsibilities:

  • Worked with the source team to understand the format and delimiters of the data file.
  • Running Periodic Map-Reduce jobs to load data from Cassandra into Hadoop.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will invoke and run Map-Reduce jobs in the backend.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Experience in analyzing Cassandra database and comparing it with other open-source NoSQL databases to find which one of them better suits the current requirements.
  • Loading the output data into Cassandra using Bulk Load.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Implemented log-aggregation and transforming data for analytics using Apache Kafka.
  • Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
  • Experience in deploying applications in heterogeneous Application Servers TOMCAT, Web Logic, IBM WebSphere and Oracle Application Server.
  • Experienced with performing CURD operations in HBase.
  • Transformed the data using Hive, Pig for BI team to perform visual analytics according to the client requirement.
  • Analysis with data visualization player Tableau. Writing Pig Scripts for data processing.
  • Created Java component for Athena to query back from s3 parquet files
  • Developed Spark applications for the entire batch processing by using Scala.
  • These new data items will be used for further analytics/reporting purpose. It has Cognos reports as the BI component.
  • Developed scripts and automated data management from end to end and sync up b/w all the Clusters.
  • Implemented Fair schedulers on the Job Tracker to share the resources of the cluster for the Map Reduce jobs given by the users.

Environment: Big Data/Hadoop, Cloudera CDH 3/4 Distribution, HDFS, Kafka, MapReduce, Cassandra, Hive, Oozie, Pig, Python, Java, Shell Scripting, Scala, MySQL, IBM WebSphere, Tomcat and Tableau.

Confidential

Hadoop Developer

Responsibilities:

  • Explored and used Hadoop ecosystem features and architectures.
  • Worked with business team to gather their requirements and new support features.
  • Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
  • Participated in architectural and design decisions with respective teams. Developed in-memory data grid solution across conventional and cloud environments using Oracle Coherence.
  • Monitor and Trouble Shoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and kibana.
  • Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
  • Wrote programs using scripting languages like Pig to manipulate data.
  • Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in map reduce way.
  • Monitoring the running Map-Reduce programs on the cluster.
  • Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS for analysis.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Designed database and created tables, written the complex SQL Queries and stored procedures as per the requirements.
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
  • Prepared Shell scripts to get the required info from the logs.
  • Responsible to manage data coming from different sources.

Environment: Apache Hadoop, Java, Scala, Eclipse, MySQL, Pig, Hive, Sqoop, EMR, Linux, Oozie, Shell Scripting.

Confidential

Java Developer

Responsibilities:

  • Involved in all layers like Presentation Layer, Business Logic and Data Access Layers.
  • Front - end is designed by using HTML, CSS, JSP, JSTL and Struts.
  • Involved in developing the CSS sheets for the UI Components.
  • Used Struts Tiles-definition for lay outing the different sections of the page.
  • Designed the Struts Validation Framework for validating the UI, tiles framework and implemented Internationalization (il8n).
  • Involved in writing Struts form-beans for transferring the data from Controller to the Model.
  • Involved in developing Hibernate mapping files and POJOs for Hibernate persistence layer.
  • Used Hibernate as ORM tool for accessing database.
  • Implemented different modules of Spring Framework such as IOC, DAO, O/R mapping.
  • Implementing the file upload and download functionality using Struts, Servlets.
  • Integrated and configured Struts Spring and Hibernate framework environment.
  • Used Log4j for logging in the application.
  • Involved in front end validation using Struts Validation and JavaScript.
  • Designing and configuring core xml files for implementation of struts.
  • Used ANT tool for creating and deploying the .war files.
  • Involved in Unit and System Testing using JUnit (TFD) before placing the application for the Acceptance Testing.
  • JDBC connection pooling for accessing Oracle 10g database.
  • Used SOAP WebServices (synchronous and asynchronous) for checking customer information like names (NA) and credit checks.
  • Used Rational Clear Case for version control.
  • Extensively used RAD 6.0 with various plugins for implementing various modules.
  • Developed Ant build scripts for deploying the project on WebSphere 6.0 application server.
  • Involved in unit testing of different components using JUnit.
  • Used MQ Series for different applications so that they can work together.
  • Responsible for the support and maintenance of the application.

Environment: Java (JDK 1.5), J2EE, JSP 1.2, Servlets, Java Script, HTML, Struts1.2Spring, Hibernate, RAD 6.0, JSTL, Rational Clear Case, SQL, MQ Series Windows XP.

We'd love your feedback!