We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

4.00/5 (Submit Your Rating)

Plano, TX

SUMMARY:

  • Spark developer with 5+ years of experience in Big data application development through frameworks Hadoop, Spark, Hive, Sqoop, Flume, Oozie, Kafka.
  • Hands on experience with Hadoop/Spark Distribution - Cloudera, Hortonworks.
  • Experience in implementing Spark with the integration of Hadoop Ecosystem.
  • Experience in data cleansing using Spark map and Filter Functions.
  • Experience in designing and developing application in Spark using Scala.
  • Experience migrating map reduce programs into Spark RDD transformations or actions to improve performance.
  • Experience in creating Hive Tables and loading the data from different file formats.
  • Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
  • Experience developing and Debugging Hive queries.
  • Experience in processing the data using HiveQL and Pig Latin scripts for data Analytics.
  • Extending Hive Core functionality by writing UDF’s for Data Analysis.
  • Experience converting HiveQL/SQL queries into Spark transformations through Spark RDD and Data frames API in Scala.
  • Used Oozie to Manage and schedule Spark Jobs on a Hadoop Cluster.
  • Used HUE GUI to implement Oozie scheduler and workflows.
  • Good Experience in Data importing and exporting to Hive and HDFS with Sqoop.
  • Experience in using Producer and Consumer API’s of Apache Kafka.
  • Skilled in integrating Kafka with Spark streaming for faster data processing.
  • Experience in using Spark Streaming programming model for Real-time data processing.
  • Experience dealing with the file formats like text files, Sequence files, JSON, Parquet, ORC.
  • Extensively used Apache Kafka to collect the logs and error messages across the cluster.
  • Excellent knowledge and understanding of Distributed Computing and Parallel processing frameworks.
  • Experienced at performing read and write operations on HDFS file system.
  • Experience working with large data sets and making performance improvements.
  • Experience working with EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), setting up EMR (Elastic MapReduce).
  • Extensive programming knowledge in developing Java application using Java, J2EE and JDBC.
  • Good experience working on Tableau and enabled the JDBC/ODBC data connectivity from those to Hive tables.
  • Experience creating and driving large scale ETL pipelines.
  • Good with version control systems like GIT.
  • Strong knowledge on UNIX/LINUX commands.
  • Adequate Knowledge on Python scripting Language.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.
  • Highly motivated and committed to the highest levels of professionalism.
  • Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.

TECHNICAL SKILLS:

Big Data Technologies: Apache Hadoop, Apache Spark, Map Reduce, Apache Hive, Apache Pig, Apache Sqoop, Apache Kafka, Apache Flume, Apache Oozie, Apache Zookeeper, HDFS

Databases: MySQL, Oracle 11g.

Languages: Scala, JAVA

Operating Systems: Mac OS, Windows 7/10, Linux (Cent OS, Redhat, Ubuntu).

Development Tools: Apache Tomcat, Eclipse, NetBeans, IntelliJ.

PROFESSIONAL EXPERIENCE:

Confidential, Plano, TX

Spark/Hadoop Developer

Responsibilities:

  • Worked under the Cloudera distribution CDH 5.13 version.
  • Involved in working with Sqoop for fetching the data from RDBMS.
  • Transformed and stored the ingested data into Data frames using spark SQL.
  • Created Hive tables to load the transformed Data.
  • Performed partitions and bucketing in hive for easy data classification.
  • Worked on Performance and Tuning optimization of Hive.
  • Involved in exporting Spark SQL Data frames into hive tables stored as Parquet Files.
  • Involved in Ingesting real-time log data from various producers using Kafka.
  • Used spark streaming to subscribe to desired topics for real time processing.
  • Transformed the DStreams into Data frames using spark engine.
  • Experienced in performance tuning of Spark Application for setting right Batch Interval time, level of Parallelism and memory tuning for optimal Efficiency.
  • Responsible for performing sort, join, aggregations, filter, and other transformations on the data.
  • Appended the Data frames to pre-existing data in hive.
  • Performed analysis on the hive tables based on the business logic.
  • Created a data pipeline using Oozie workflows which performs jobs on a daily basis.
  • Involved in Analyzing data by writing queries in HiveQL for faster data processing.
  • Involved in Persisting Metadata into HDFS for further data processing.
  • Loading data from Linux File systems to HDFS and vice-versa using shell commands.
  • Used GIT as Version Control System.
  • Worked with Jenkins for continuous integration.
  • Build hive tables on the transformed data and used different SERDE’S to store data in HSFS in different formats.
  • Used different API’s to perform necessary transformation and actions on the data which gets from kafka in real time.
  • Involved in collecting and transferring the data from various webservers to HDFS using Apache Kafka.

Environment: CDH 5.1, HDFS, Hadoop 3.0, Spark 2.4, Scala, Hive 3.0, Pig, Hue, Oozie, Sqoop, Kafka, Linux shell, Git, Jenkins, Agile.

Confidential, Charlotte, NC

Spark/Hadoop Developer

Responsibilities:

  • Worked under the Hortonworks Enterprise.
  • Worked on large sets of structured and semi-structured historical data.
  • Involved in working with Sqoop to import the data from RDBMS to Hive.
  • Created Hive tables to load the Data and stored as ORC files for processing.
  • Implemented Hive Partitioning and bucketing for further classification of data.
  • Worked on Performance and Tuning optimization of Hive.
  • Involved in cleansing and transforming the data.
  • Used spark SQL to perform sort, join and filter the data.
  • Copied the ORC files to amazon s3 buckets using Sqoop for further processing in amazon EMR.
  • Wrote custom UDF’s in Spark SQL using Scala.
  • Performed data Aggregation operations using Spark SQL queries.
  • Copied output data back to Hive from Amazon S3 buckets using Sqoop after getting the output desired by the business.
  • Setup Kafka to subscribe to topics(sensors) and load data directly to Hive table.
  • Automated filter and join operations to join new data with the respective Hive tables using Oozie workflows daily.
  • Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling workflows.
  • Compared the sensor data to a persisted table on a 24hr period to check if the machine is operating at optimal conditions and Used Kafka as a messaging system to notify the producer of that data and the maintenance department in case a maintenance is required.
  • Used Git as Version Control System.
  • Worked with Jenkins for continuous integration.

Environment: HDP 2.5, HDFS, Hadoop 2.7, Spark 2.1, Kafka, Amazon S3, EMR, Sqoop, Oozie, Hive 2.1, Pig, Hue, Linux shell, Git, Jenkins, Agile.

Confidential

Hadoop Developer

Responsibilities:

  • Worked under the Cloudera distribution.
  • Responsible for building scalable distributed data solutions using Hadoop. Developed Simple to complex Map Reduce jobs.
  • Created and populated bucketed tables in Hive to allow for faster map side joins and for more efficient jobs and more efficient sampling.
  • Also performed partitioning of data to optimize Hive queries.
  • Handled importing of data from Oracle 11g to Hive tables using Sqoop on a regular basis, later performed join operations on the data in the Hive.
  • Develop User defined functions in Hive to work on multiple input rows and provide an aggregated result based on the business requirement.
  • Wrote user defined custom counters to add to the Map Reduce job to gain further insight and for debugging purposes.
  • Developed a Map Reduce job to perform lookups of all entries based on a given key from a collection of Map files that were created from the data.
  • Performed side data distribution using the distributed cache to make read only data available to the job to process the main dataset.
  • Used Combine File Input Format to make sure maps had sufficient data to process when there is a large number of small files. Also packaged a collection of small files into a Sequence File which was used as input to the Map Reduce job.
  • Implemented LZO compression of Map output to reduce I/O between mapper and reducer nodes.
  • Continuous monitoring and managing the Hadoop cluster using Web console.
  • Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.

Environment: CDH 5.0, HDFS, Hadoop 2.7, Map Reduce, spark 1.6, Hive 1.2, Pig, Hue, Oozie, Sqoop, Scala, Oracle 12c, YARN, Linux shell, GIT, Jenkins, Agile.

Confidential

Python Developer

Responsibilities:

  • Experienced with Python frameworks like WPebapp2 and, Flask.
  • Experienced in WAMP (Windows, Apache, MYSQL, and Python PHP) and MVC Struts
  • Developed mobile cross-browser web application Angular JS, JavaScript API.
  • Successfully migrated the Django database from SQLite to MySQL to PostgreSQL with complete data integrity.
  • Used Celery with Rabbit MQ and Flask to create a distributed worker framework.
  • Created Automation test framework using Selenium.
  • Responsible for design and development of Web Pages using PHP, HTML, JOOMLA, CSS including Ajax controls and XML.
  • Developed intranet portal for managing Amazon EC2 servers using Tornado and MongoDB.
  • Expertise in developing different web applications implementing the Model-View-Controller (MVC) architectures using Full stack frameworks such as Turbo Gears.
  • Implemented monitoring and established best practices around using Elastic search.
  • Strong experience in building large, responsive based REST web application experienced in Cherrypy framework, Python.
  • Used Test driven approach (TDD) for developing services required for the application.
  • Developed mobile cross-browser web application Angular JS, JavaScript API.

Environment: Python 2.7/3.0, PL/SQL C++, Redshift, XML, Agile (SCRUM), PyUnit, MYSQL, Apache, CSS, MySQL, DHTML, HTML, JavaScript, Shell Scripts, Git, Linux, Unix and Windows.

We'd love your feedback!