Spark/hadoop Developer Resume
Nashville, TN
PROFESSIONAL SUMMARY
- Overall 8+ years of experience in all phases of Software Application requirement analysis, design, development and maintenance of Hadoop/Big Data application and web applications using java/J2EE technologies.
- Having 3+ years of hands on experience wif Big Data Ecosystems including Hadoop (1.0 and YARN) MapReduce, Spark, Pig, Hive, Sqoop, Flume, Oozie, Zookeeper in a range of industries.
- Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode and MapReduce programming paradigm.
- Expertise in all components of Hadoop Ecosystem - Spark, Hive, Pig, Sqoop, HBase, Flume, Kafka, Oozie, Hue, Zeppelin, Nifi and EC2 cloud computing wif AWS.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Developed Oozie workflows by integrating all tasks relating to a project and schedule teh jobs as per requirements.
- Used Hbase in accordance wif Hive as and when required for real time low latency queries.
- Acute knowledge on Spark architecture and real-time streaming using Spark.
- Extensively used Spark SQL, Pyspark & Scala API’s for querying and transformation of data residing in Hive.
- Good knowledge on Amazon Web Services(AWS) cloud services like EC2, S3, EBS, RDS and VPC.
- Hands on experience spinning up different AWS instances including EC2-classic and EC2-VPC using cloudformation templates.
TECHNICAL SKILLS
Hadoop Eco-System: Hadoop, Mapreduce, HDFS, Kafka, Hive, Pig, Sqoop,Impala, Oozie, Flume, Yarn, Zookeeper,Hbase.
Spark components: Spark, Spark SQL, Spark Streaming, Python.
AWS Cloud Services: S3, EBS, EC2, VPC, Redshift, EMR
Programming Languages: Operating System Java, Scala, SQL, Shell scripting, AngularJS, HTML5, and CSS. Windows, UNIX, Linux distributions (Centos, Ubuntu):
Databases: Cassandra, Hbase, Oracle, DB2, MySQL, SQLite, MS SQL Server 2008 / 2012.
Development Processes: RUP, AGILE, Scrum Big data Platforms
PROFESSIONAL EXPERIENCE
Confidential, Nashville,TN
Spark/Hadoop Developer
Responsibilities:
- Interacted wif multiple teams in understanding their business requirements for designing flexible and common component.
- Developed data pipeline using Sqoop, Flume from teradata to store data into HDFS and further processing through spark.
- Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques wif optimizations.
- Experience working wif avro, parquet file formats wif snappy compression.
- Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
- Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib.
- Conceived and designed producer and consumer using Kafka 0.10 and teh Spark Streaming
- Automated teh Hadoop pipeline using oozie and scheduled using coordinator for time frequency and data availability.
- Participated in evaluation and selection of new technologies to support system efficiency.
Environment: Hadoop, Cloudera, HDFS, Hive, Impala, Spark, Autosys, Kafka, Sqoop, Pig, Java, Scala, Eclipse,, Teradata, UNIX, and Maven.
Confidential, St Louis,Missouri
Hadoop Developer
Responsibilities:
- Developed Spark Programs for Batch processing.
- Worked on Spark SQL and Spark Streaming.
- Developed Spark scripts by python and Scala shell commands as per teh requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Used Spark Sql wif Scala for creating data frames and performed transformations on data frames.
- Implemented Spark SQL to access hive tables into spark for faster processing of data.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading data and writing Hive queries.
- Imported and exported data into HDFS using Sqoop which includes incremental loading.
- Experienced in defining job flows managing and reviewing Hadoop log files.
- Responsible in managing data coming from different sources.
- Supported MapReduce Programs those are running on teh cluster.
- Jobs management using Fair scheduler and Cluster coordination services through Zoo Keeper.
- Involved in loading data from UNIX file system to HDFS.
- Hands on Experience in Oozie Job Scheduling.
- Worked closely wif AWS to migrate teh entire Data Centers to teh cloud using VPC, EC2, S3, EMR.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java, Scala,Spark, Hortonworks, Hbase, Amazon EMR, EC2, S3.
Confidential
Hadoop Developer
Responsibilities:
- Part of team for developing and writing PIG scripts.
- Loaded teh data from RDBMS SERVER to Hive using Sqoop.
- Created Hive tables to store teh processed results in a tabular format.
- Developed teh Sqoop scripts in order to make teh interaction between Hive and MySQL Database.
- Developed Java Mapper and Reducer programs for complex business requirements.
- Developed Java custom record reader, partitioner and serialization techniques.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Performed complex HiveQL queries on Hive tables and Created custom user defined functions in Hive.
- Optimized teh Hive tables using optimization techniques like partitions and bucketing to provide better performance wif HiveQL queries.
- Performed SQOOP import from Oracle to load teh data in HDFS and directly into Hive tables.
- Performed incremental data movement to Hadoop using Sqoop.
- Scheduled mapreduce jobs in production environment using Oozie scheduler.
- Analyzed teh Hadoop logs using PIG scripts to oversee teh errors caused by teh team.
- Experience in gathering requirements from teh client, giving estimates for developing projects and delivering teh projects in time.
Environment: Java, Hadoop, MapReduce, HDFS, Pig, Hive,Spark, Scala, Hortonworks, Hbase.
Confidential
Java Developer
Responsibilities:
- Involved in Analysis, Design, Development and Testing of teh application.
- Incorporated UML diagrams (Class diagrams, Activity diagrams, Sequence diagrams) as part of design documentation and other system documentation.
- Enhanced teh Port search functionality by adding a VPN Extension Tab.
- Created end to end functionality for view and edit of VPN Extension details.
- Used AGILE process to develop teh application as teh process allows faster development as compared to RUP.
- Used Hibernate for persistence framework
- Used Struts MVC framework and WebLogic Application Server in this application.
- Involved in creating DAO’s and used Hibernate for ORM mapping.
- Implemented using Spring Framework for rapid development and ease of maintenance.
- Written procedures, and triggers for validating teh consistency of Metadata.
- Written SQL code blocks using cursors for shifting records from various tables based on checks.
- Fixed defects and generated input XML’s to run on SOA Client to generate output XML for testing Web services.
- Written Java classes to test UI and Web services through JUnit and JWebUnit.
- Extensively involved in release/deployment related critical activities.
- Performed functional and integration testing and also tested teh entire application using JUnit and JWebUnit.
- Log4J was used to log both User Interface and Domain Level Messages and also used CVS for version control.
Environment: JAVA, JSP, servlets, J2EE, EJB, Struts Framework, JDBC, WebLogic Application Server, Hibernate, Spring Framework, Oracle 9i, Unix, Web Services, CVS, Eclipse, JUnit, JWebUnit.