Spark/Hadoop Developer Resume Plano, TX - Hire IT People

SUMMARY:

Spark developer with 5+ years of experience in Big data application development through frameworks Hadoop, Spark, Hive, Sqoop, Flume, Oozie, Kafka.
Hands on experience with Hadoop/Spark Distribution - Cloudera, Hortonworks.
Experience in implementing Spark with the integration of Hadoop Ecosystem.
Experience in data cleansing using Spark map and Filter Functions.
Experience in designing and developing application in Spark using Scala.
Experience migrating map reduce programs into Spark RDD transformations or actions to improve performance.
Experience in creating Hive Tables and loading the data from different file formats.
Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
Experience developing and Debugging Hive queries.
Experience in processing the data using HiveQL and Pig Latin scripts for data Analytics.
Extending Hive Core functionality by writing UDF’s for Data Analysis.
Experience converting HiveQL/SQL queries into Spark transformations through Spark RDD and Data frames API in Scala.
Used Oozie to Manage and schedule Spark Jobs on a Hadoop Cluster.
Used HUE GUI to implement Oozie scheduler and workflows.
Good Experience in Data importing and exporting to Hive and HDFS with Sqoop.
Experience in using Producer and Consumer API’s of Apache Kafka.
Skilled in integrating Kafka with Spark streaming for faster data processing.
Experience in using Spark Streaming programming model for Real-time data processing.
Experience dealing with the file formats like text files, Sequence files, JSON, Parquet, ORC.
Extensively used Apache Kafka to collect the logs and error messages across the cluster.
Excellent knowledge and understanding of Distributed Computing and Parallel processing frameworks.
Experienced at performing read and write operations on HDFS file system.
Experience working with large data sets and making performance improvements.
Experience working with EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), setting up EMR (Elastic MapReduce).
Extensive programming knowledge in developing Java application using Java, J2EE and JDBC.
Good experience working on Tableau and enabled the JDBC/ODBC data connectivity from those to Hive tables.
Experience creating and driving large scale ETL pipelines.
Good with version control systems like GIT.
Strong knowledge on UNIX/LINUX commands.
Adequate Knowledge on Python scripting Language.
Adequate knowledge of Scrum, Agile and Waterfall methodologies.
Highly motivated and committed to the highest levels of professionalism.
Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.

TECHNICAL SKILLS:

Big Data Technologies: Apache Hadoop, Apache Spark, Map Reduce, Apache Hive, Apache Pig, Apache Sqoop, Apache Kafka, Apache Flume, Apache Oozie, Apache Zookeeper, HDFS

Databases: MySQL, Oracle 11g.

Languages: Scala, JAVA

Operating Systems: Mac OS, Windows 7/10, Linux (Cent OS, Redhat, Ubuntu).

Development Tools: Apache Tomcat, Eclipse, NetBeans, IntelliJ.

PROFESSIONAL EXPERIENCE:

Confidential, Plano, TX

Spark/Hadoop Developer

Responsibilities:

Worked under the Cloudera distribution CDH 5.13 version.
Involved in working with Sqoop for fetching the data from RDBMS.
Transformed and stored the ingested data into Data frames using spark SQL.
Created Hive tables to load the transformed Data.
Performed partitions and bucketing in hive for easy data classification.
Worked on Performance and Tuning optimization of Hive.
Involved in exporting Spark SQL Data frames into hive tables stored as Parquet Files.
Involved in Ingesting real-time log data from various producers using Kafka.
Used spark streaming to subscribe to desired topics for real time processing.
Transformed the DStreams into Data frames using spark engine.
Experienced in performance tuning of Spark Application for setting right Batch Interval time, level of Parallelism and memory tuning for optimal Efficiency.
Responsible for performing sort, join, aggregations, filter, and other transformations on the data.
Appended the Data frames to pre-existing data in hive.
Performed analysis on the hive tables based on the business logic.
Created a data pipeline using Oozie workflows which performs jobs on a daily basis.
Involved in Analyzing data by writing queries in HiveQL for faster data processing.
Involved in Persisting Metadata into HDFS for further data processing.
Loading data from Linux File systems to HDFS and vice-versa using shell commands.
Used GIT as Version Control System.
Worked with Jenkins for continuous integration.
Build hive tables on the transformed data and used different SERDE’S to store data in HSFS in different formats.
Used different API’s to perform necessary transformation and actions on the data which gets from kafka in real time.
Involved in collecting and transferring the data from various webservers to HDFS using Apache Kafka.

Environment: CDH 5.1, HDFS, Hadoop 3.0, Spark 2.4, Scala, Hive 3.0, Pig, Hue, Oozie, Sqoop, Kafka, Linux shell, Git, Jenkins, Agile.

Confidential, Charlotte, NC

Spark/Hadoop Developer

Responsibilities:

Worked under the Hortonworks Enterprise.
Worked on large sets of structured and semi-structured historical data.
Involved in working with Sqoop to import the data from RDBMS to Hive.
Created Hive tables to load the Data and stored as ORC files for processing.
Implemented Hive Partitioning and bucketing for further classification of data.
Worked on Performance and Tuning optimization of Hive.
Involved in cleansing and transforming the data.
Used spark SQL to perform sort, join and filter the data.
Copied the ORC files to amazon s3 buckets using Sqoop for further processing in amazon EMR.
Wrote custom UDF’s in Spark SQL using Scala.
Performed data Aggregation operations using Spark SQL queries.
Copied output data back to Hive from Amazon S3 buckets using Sqoop after getting the output desired by the business.
Setup Kafka to subscribe to topics(sensors) and load data directly to Hive table.
Automated filter and join operations to join new data with the respective Hive tables using Oozie workflows daily.
Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling workflows.
Compared the sensor data to a persisted table on a 24hr period to check if the machine is operating at optimal conditions and Used Kafka as a messaging system to notify the producer of that data and the maintenance department in case a maintenance is required.
Used Git as Version Control System.
Worked with Jenkins for continuous integration.

Environment: HDP 2.5, HDFS, Hadoop 2.7, Spark 2.1, Kafka, Amazon S3, EMR, Sqoop, Oozie, Hive 2.1, Pig, Hue, Linux shell, Git, Jenkins, Agile.

Confidential

Hadoop Developer

Responsibilities:

Worked under the Cloudera distribution.
Responsible for building scalable distributed data solutions using Hadoop. Developed Simple to complex Map Reduce jobs.
Created and populated bucketed tables in Hive to allow for faster map side joins and for more efficient jobs and more efficient sampling.
Also performed partitioning of data to optimize Hive queries.
Handled importing of data from Oracle 11g to Hive tables using Sqoop on a regular basis, later performed join operations on the data in the Hive.
Develop User defined functions in Hive to work on multiple input rows and provide an aggregated result based on the business requirement.
Wrote user defined custom counters to add to the Map Reduce job to gain further insight and for debugging purposes.
Developed a Map Reduce job to perform lookups of all entries based on a given key from a collection of Map files that were created from the data.
Performed side data distribution using the distributed cache to make read only data available to the job to process the main dataset.
Used Combine File Input Format to make sure maps had sufficient data to process when there is a large number of small files. Also packaged a collection of small files into a Sequence File which was used as input to the Map Reduce job.
Implemented LZO compression of Map output to reduce I/O between mapper and reducer nodes.
Continuous monitoring and managing the Hadoop cluster using Web console.
Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
Installed Oozie workflow engine to run multiple Hive and Pig jobs.

Environment: CDH 5.0, HDFS, Hadoop 2.7, Map Reduce, spark 1.6, Hive 1.2, Pig, Hue, Oozie, Sqoop, Scala, Oracle 12c, YARN, Linux shell, GIT, Jenkins, Agile.

Confidential

Python Developer

Responsibilities:

Experienced with Python frameworks like WPebapp2 and, Flask.
Experienced in WAMP (Windows, Apache, MYSQL, and Python PHP) and MVC Struts
Developed mobile cross-browser web application Angular JS, JavaScript API.
Successfully migrated the Django database from SQLite to MySQL to PostgreSQL with complete data integrity.
Used Celery with Rabbit MQ and Flask to create a distributed worker framework.
Created Automation test framework using Selenium.
Responsible for design and development of Web Pages using PHP, HTML, JOOMLA, CSS including Ajax controls and XML.
Developed intranet portal for managing Amazon EC2 servers using Tornado and MongoDB.
Expertise in developing different web applications implementing the Model-View-Controller (MVC) architectures using Full stack frameworks such as Turbo Gears.
Implemented monitoring and established best practices around using Elastic search.
Strong experience in building large, responsive based REST web application experienced in Cherrypy framework, Python.
Used Test driven approach (TDD) for developing services required for the application.
Developed mobile cross-browser web application Angular JS, JavaScript API.

Environment: Python 2.7/3.0, PL/SQL C++, Redshift, XML, Agile (SCRUM), PyUnit, MYSQL, Apache, CSS, MySQL, DHTML, HTML, JavaScript, Shell Scripts, Git, Linux, Unix and Windows.

We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

Plano, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship