We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Iowa City, IA

SUMMARY:

  • Talented and accomplished Software Engineer with 8 years of IT experience in developing applications using BigData, AWS, Java,SQL and Spark.
  • 9+ years of experience with Big Data tools like MapReduce, YARN, HDFS, Hbase, Impala,Hive, Pig, Oozie,AWS,, ApacheSpark for ingestion, storage, querying, processing and analysis of data.
  • Performance tuning in Hive&Impala using multiple methods limited to dynamic partitioning, bucketing, indexing, files compressions.
  • Hands on experience withdata ingestion tools Kafka, Flume and workflow management tools Oozie and Zena.
  • Hands on experience handling different file formats like JSON, AVRO, ORC, Parquet and compression techniques like snappy, zlib and lzo.
  • Hands on experience in Hadoop Ecosystem components such as Hadoop, Spark, HDFS, YARN, TEZ, Hive, Sqoop, Flume, MapReduce, SCALA, Pig, OOZIE, Kafka, NIFI, Storm, HBASE.
  • Experience on analyzing data in NOSQL databases like Hbase and Cassandraand its Integration with Hadoop cluster.
  • Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
  • Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
  • Developed Java applications using various IDE's like Spring Tool Suite and Eclipse.
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Operated on Java/J2EE systems with different databases, which include Oracle, MySQL and DB2.
  • Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
  • Capable of processing large sets of structured, semi - structured and unstructureddata and supporting systems application architecture.
  • Extensive development experience in sparkapplications for datatransformations and loading into HDFS using RDD, DataFrames and Datasets.
  • Extensive knowledge on performance tuning of Spark applications and converting Hive/SQL queries into Sparktransformations.
  • Hands-on experience with AWS (AmazonWebServices), using ElasticMapReduce (EMR), creating and storing data in S3buckets and creating ElasticLoadBalancers(ELB) for Hadoop front end WebUI's.
  • Extensive knowledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring them through ambari and using IAM (Identity and AccessManagement) for creating groups, users and assigning permissions.
  • Extensive programming experience in JavaCore concepts like OOPS, Multithreading, Collections and IO.
  • Experience using Jira for ticketing issues and Jenkins for continuous integration.
  • Extensive experience with UNIX commands, shellscripting and setting up CRON jobs.
  • Experience in software configuration management using Git.
  • Good experience in using Relational databases Oracle&MySQL.
  • Able to assess businessrules, collaborate with stakeholders and perform source-to-target datamapping, design.
  • Successfully working in fast-paced environment, both independently and in collaborative team environments.

TECHNICAL SKILLS:

Operating Systems: Win 95, 98, 2000/XP and UNIX, Linux

Languages: SQL, HTML, CSS, JAVASCRIPT, JAVA

Database: RDBMS, Oracle, DB2, SQL Server, MS Access, Database: MySQL, Oracle, PostgreSQL, MS Access, R Language, Hive, Spartan

Utilities: MS Word, Excel, Macros, Access, Power Point

Hadoop Technologies: HDFS, Hive, Pig, Scoop, Oozie, HDFS, Map Reduce, HBase

PROFESSIONAL EXPERIENCE:

Senior Hadoop Developer

Confidential - Iowa City, IA

Responsibilities:

  • Strong understanding and practical experience in developing Spark applications with Scala.
  • Developed Spark scripts by using Spark shell commands as per the requirement.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD in Spark for Data Aggregation.
  • Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame and pair RDD's
  • Experience in developing SparkSQL applications both using SQL and DSL
  • Extensively worked with parquet file format and gained practical knowledge in writing spark and hive applications to meet the parquet requirements.
  • Experience in using various compression techniques along with Parquet file format.
  • Experience in managing datasets and gained good experience in creating the test datasets for development purpose
  • Experience in building dimensional and fact tables using Spark Scala applications
  • Practical knowledge on writing applications in Scala to interact with the Hive through the Spark application.
  • Extensively used Hive partitioned tables, map join, bucketing and gained good understanding of dynamic partitioning.
  • Performed POC on writing the spark applications in Scala, Python and R programming language
  • Good hands on experience with Hive to perform data queries and analysis as a part of the QA
  • Practical experience in using Pig to perform the QA by calculating the statistics of the final output.
  • Experience in designing both time driven and data driven automated workflows using Oozie
  • Experience in writing Sqoop scripts to import data from exadata to HDFS
  • Good exposure to MongoDB, it's functionality and use-cases
  • Gained good exposure to Hue interface for monitoring the job status, managing the HDFS files, tracking the scheduled jobs and managing the Oozie workflows
  • Performed optimizations and performance tuning in Spark and Hive
  • Developed Unix script to automate data load into HDFS
  • Strong knowledge on HDFS commands to manage the files and also gained good understanding in managing the file system through the Spark Scala applications.
  • Extensive usage of alias for Oozie and HDFS commands
  • Experienced in managing and reviewing Hadoop log files.
  • Experience in log controlling for Spark applications and extensive use of log4j to log the respective phases of the application accordingly
  • Good knowledge on GIT commands, version tagging and pull requests
  • Performed unit testing and also integration testing after the development and participated in code reviews.
  • Experience in writing the Junit test cases for testing the Spark and SparkSQL applications
  • Practical experience with developing applications in IntelliJ and Maven
  • Good exposure to Agile environment. Participated in daily standups, Big Room Planning, Sprint meetings and Team Retrospectives
  • Interact with business analysts to understand the business requirements and translate them to technical requirements

Environment: Hadoop 2.6.0-cdh5.7.0, Java 1.8.0 92, Spark 1.6.0, SparkSQL, R programming, Python, Scala 2.10.5, MongoDB, Apache Pig 0.12.0, Apache Hive 1.1.0, HDFS, Sqoop, Oozie, Maven, IntelliJ, GIT, UNIX Shell scripting, Oracle 11g/10g, Log4j, Linux, Agile development

Hadoop/Spark Developer

Confidential - Atlanta, GA

Responsibilities:

  • Developed MapReduce jobs to process documents
  • Responsible for SOLR implementation and setup collections in SolrCloud.
  • Involved in Hadoop cluster setup and configuring Hadoop Ecosystems.
  • Developing Scripts and Batch Job to schedule various Hadoop Program
  • Write code to parse the external documents before copying to HDFS.
  • Developed Spark scripts by using Scala as per the requirement.
  • Developed HBase ingestion for documents and tuning
  • Developed web application to interact with SOLR for searching documents and ingest using SOLRJ api
  • Developed Spark jobs using Scala for processing locomotive events
  • Responsible for interacting with business partners and gather requirements and prepare technical design documents.
  • Developed service oriented architecture (SOA) based design of the application
  • Responsible for writing detail design documents and class diagrams and sequence diagrams.
  • Developed composite components using JSF 2.0.
  • Coordinating with the Onsite team and Clients.
  • Preparing the Unit Test Cases and executing the same.
  • Involved in the Integration testing, User Acceptance Support.
  • Involved in the Production Support.
  • Collaborate with product/business users, data scientists and other engineers to define requirements to design, build and tune complex solutions.
  • Involved in business requirement gathering, analysis and preparing design documents
  • Involved in preparing SOLR collection and schema creation.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • Involved in debugging and fine tuning the SOLR cluster and queries
  • Involved in importing documents data from external system to HDFS
  • Developed Spark streaming applications to process real time events, ingest emails and instant messages into HBase and Elasticsearch.
  • Managing and allocating tasks for onsite and offshore resources
  • Involved in setting up Kerberos and authenticating from web application
  • Involved in the refactoring the existing application to improve the performance of the application.
  • Interacting with client to map the legacy data with SCOPE specific data.
  • Developed Service Java Classes interface between application and external systems
  • Have written SQL query for creating the batch table.
  • Involved in Build Process and run the deployment procedure in the UNIX Environment on regular basis.
  • Monitoring the log files on regular basis in UNIX environment.

Environment: Hortonworks Data Platform (HDP 2.3), Hadoop, HDFS, Spark, Kafka, Hive, SOLR 5.2.1, HBase, Sqoop, Solr, Sun Solaris, Elasticsearch 2.0.0, RSA, Primefaces, JSF, RAD 8/8.5, AngularJS, Websphere Application Server 8/8.5, Java 1.7, Subversion, EJB 3.0, Oracle 11g.

Hadoop Developer

Confidential - Arlington, VA

Responsibilities:

  • Installed Hadoop, MapReduce, HDFS, and developed multiple MapReduce jobs in PIG and HIVE for data.
  • Used IMPALA to read, write and query the Hadoop data in HDFS and configured KAFKA to read and write messages from external programs.
  • Used PIG as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Created Stored Procedures to transform the Data and worked extensively in SQL for various needs of the transformations while loading the data.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark data frames.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.

Environment: Cloudera, Hadoop, HDFS, Hive, Impala,Spark Sql, Python, Sqoop, Oozie, Storm, Spark, Scala, MySQL, Shell Scripting

Hadoop Developer

Confidential, CA

Responsibilities:

  • Create the project using HIVE, BIGSQL, PIG
  • Involved in data modeling in Hadoop.
  • Creating Hive tables and working on them using Hiveql.
  • Written Apache PIG scripts to process the HDFS data.
  • Involved in data modeling in Hadoop.
  • Automated tasks using UNIX shell scripts.

Environment: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Scala, Python, HBASE, OOZIE, yarn, Spark, Core Java, Oracle, SQL, UBUNTU/UNIX, eclipse, Maven, JDBC drivers, Mainframe, MySQL, Linux, AWS, XML, CRM, SVN, PDSH, Putty, BigInsights

We'd love your feedback!