We provide IT Staff Augmentation Services!

Hadoop /spark Developer Resume

3.00/5 (Submit Your Rating)

NC

PROFESSIONAL SUMMARY:

  • About 8+years of technical expertise in complete software development life cycle process including Analysis, Design, Development, Testing and Implementation with about 3+ years of experience in Hadoop development.
  • Experience on major Hadoop ecosystem’s projects such as Pig, Hive, Spark, Sqoop, Kafka, Flume, HBase, Cassandra and Zookeeper.
  • Experience in developing and implementing MapReduce jobs using java to process and perform various analytics on large datasets.
  • In depth understanding and good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts.
  • Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive using Sqoop.
  • Experience in developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.
  • Experience in designing table partitioning, bucketing and optimized hive scripts using different performance utilities and techniques.
  • Experience in developing Hive UDF’s and running hive scripts using different execution engines like Tez and Spark (Hive on Spark).
  • Experienced in Developing Spark application using Spark Core, Spark SQL.
  • Experienced in creating DataFrames and implementing multiple Transformations and Actions.
  • Experienced in automating Sqoop and hive queries using Oozie workflow.
  • Expertise in scheduling the jobs up using Oozie workflow in Autosys
  • Good knowledge in AWS cloud services like S3, EC2 and EMR.
  • Efficient in working with different file formats and compression techniques like Avro, Parquet, Gzip, LZO, Snappy and Bzip2.

TECHNICAL SKILLS:

Hadoop Eco system: HDFS, MapReduce, YARN, Hive, Pig, Spark, Sqoop, ZooKeeper, Kafka, Oozie, Flume, HBase, Cloudera and Hortonworks

Programming Languages: J2SE, J2EE, Scala, SQL and Unix shell Scripting

Scripting & Query Languages: HTML, CSS, JavaScript, DHTML, XML, JQuery, Shell Scripting

Cloud Technologies: AWS, EC2, EMR,S3

Databases & NoSQL: My SQL 5.0,SQL Server, Oracle 10g (PL/SQL), Terradata, HBase

Operating Systems: Windows 7/8, Unix/Linux

PROFESSIONAL EXPERIENCE:

Confidential, NC

HADOOP /SPARK DEVELOPER

Responsibilities:

  • Efficient in building scalable distributed data solutions using Hadoop
  • Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Used Sqoop to migrating data and incremental import into HDFS and into Hive tables from Teradata, Oracle.
  • Experienced in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Implemented the hive scripts through hive, Hive on spark and some through Spark SQL.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Experienced in performance tuning of Spark Applications for shuffle operations, correct level of Parallelism and memory tuning.
  • Involved in performing Transformations in staging layer using Spark SQL via SQL Context and HiveContext.
  • Created multiple Dataframes and joined these Dataframes, applied Spark Transformations, Actions and saved the data in HDFS as parquet file format.
  • Implemented POC in analyzing streaming data with Apache Kafka and Spark Streaming API
  • Followed Agile & Scrum principles in developing the project.
  • Used SVN repository to checking or checkout code
  • Monitored and managed the hadoop cluster through Cloudera Manager

Environment: HDFS, MapReduce, YARN, Hive, Sqoop, Spark - Core, Spark-SQL, Scala, Autosys, Oozie, Kafka, Oracle, Terradata, UNIX Shell Scripting, Cloudera.

Confidential, CA

HADOOP DEVELOPER

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Experienced on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce on EC2
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Worked with different source data file formats like JSON, CSV etc.
  • Experience in importing data from various data sources like MySQL and Netezza using Sqoop, performed transformations using Hive, Pig and loaded data back into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on Sequence files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Performed Batch processing of logs from various data sources using MapReduce
  • Implemented Hive Generic UDF's to implement business logic.
  • Experience working on multiple node cluster tool which offer several commands to return HBase usage.
  • Implemented POC to migrate iterative map reduce programs to Spark transformations using Scala.
  • Experience in Oozie workflow scheduler template to managevarious jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc
  • Implemented test scripts to support test driven development and continuous integration.

Environment: HDFS, MapReduce, Hive, Pig, Sqoop, Spark, ZooKeeper, HBase,Oozie, MySQL, AWS, EC2, S3 and UNIX Shell Scripting.

Confidential, DE

HADOOP DEVELOPER

Responsibilities:

  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Designed and implemented MapReduce jobs to support distributed processing using java, Hive and Apache Pig.
  • Written custom Input format and Record Reader classes for reading and processing the binary format in MapReduce
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.
  • Worked on Hue interface for querying the data. Created Hive tables to store the processed results in a tabular format.
  • Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation
  • Performed Data transformations in Hive and used partitions, buckets for performance improvements.
  • Used HBase to store the analyzed and processed data for scalability.
  • Utilized cluster co-ordination services through ZooKeeper.
  • Implemented test scripts to support test driven development and continuous integration.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with business users.

Environment: Mapreduce,Java, Hadoop, Cloudera, Pig, Hive, Oozie, Sqoop, Oracle, ZooKeeper & Eclipse.

Confidential

JAVA DEVELOPER

Responsibilities:

  • Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams were used
  • Worked in an Agile work environment with Content Management system for workflow management and content versioning
  • Involved in designing user screens and validations using HTML, jQuery, Ext JS and JSP as per user requirements
  • Responsible for validation of Client interface JSP pages using Struts form validation
  • UsedHibernateORM framework withSpringframework for data persistence and transaction management
  • Used Hibernate 3.0 object relational data mapping framework to persist and retrieve the data from database
  • Wrote SQL queries, stored procedures, and triggers to perform back-end database operations
  • Developed ANT Scripts to do compilation, packaging and deployment in the WebSphere server
  • Implemented the logging mechanism using Log4j framework
  • Designing and implementing algorithms
  • Wrote test cases in JUnit for unit testing of classes

Environment: JDK 1.5, J2EE 1.4,Struts 1.3, Web Services (JAX-WS, Axis 2) Hibernate 3.0,Servlets 2.5, SQL Server 2005,Windows XP, HTML, XML, Log4J, XML, XSD, jQuery

Confidential

JAVA DEVELOPER

Responsibilities:

  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
  • Reviewed the functional, design, source code and test specifications
  • Involved in developing the complete front end development using Java Script and CSS
  • Author for Functional, Design and Test Specifications
  • Implemented Backend, Configuration DAO, XML generation modules of DIS
  • Used JDBC for database access
  • Used Data Transfer Object (DTO) design patterns
  • Unit testing and rigorous integration testing of the whole application
  • Written and executed the Test Scripts using JUNIT
  • Developed XML parsing tool for regression testing
  • Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product

Environment: Java, JavaScript, HTML, CSS, JDK 1.5.1, JDBC, XML, and UML

We'd love your feedback!