We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Santa, Clara-cA

SUMMARY

  • Around 5+ years’ experience with Big Data Hadoop Ecosystem tools like HDFS, Map Reduce, Spark, YARN, Hive, Sqoop, Flume, Storm, Pig, HBASE and Cassandra with all distributed system.
  • Experience with Cloudera and Hortonworks Hadoop distributions and AWS Amazon EMR.
  • Experience in all phases of SDLC including application design, development, support, maintenance, testing, change request management and Enhancement support to the client.
  • Developed spark applications for data transformations and loading into HDFS using RDD, Data frames and datasets.
  • Experience with developing applications using Spark core, Spark SQL and good knowledge on Spark streaming.
  • Performance tuning in Hive& Impala using multiple methods but not limited to dynamic partitioning, bucketing, indexing, file compressions, and cost - based optimization, etc.
  • Experience using data ingestion tools Kafka, flume and Sqoop.
  • Experience handling different file formats like JSON, AVRO, ORC and Parquet.
  • Experience on analyzing data in NOSQL databases like HBASE, MongoDB.
  • Experience with connecting Tableau to different data sources and creating dashboards and worksheets.
  • Experience in converting Hive/SQL queries into Spark transformations.
  • Programming experience in JAVA and Scala.
  • Hands on with UNIX commands, shell scripting and setting up CRON jobs.
  • Experience in software configuration management using Git and Bitbucket.
  • Good experience in using Relational databases Oracle, SQLServer and PostgreSQL.

TECHNICAL SKILLS

Hadoop Components: HDFS, Hue, MapReduce, PIG, Hive, HBASE, Sqoop, Impala, Oozie, Zookeeper, Flume, Kafka, Yarn and Cloudera Manager.

Spark Components: Apache Spark,Data Frames, SparkSQL, Spark, YARN, Pair RDDs.

Databases: Microsoft SQL Server, MySQL, Oracle, HBASE, MongoDB and Cassandra

Programming Languages: C, C++, Java, Scala, Shell Scripting.

Web Servers: Apache HTTP andTomcat.

IDE: Eclipse, Pycharm, IntelliJ

OS/Platforms: Windows, Linux (All major distributions), Centos.

PROFESSIONAL EXPERIENCE

Confidential, Santa Clara-CA

Hadoop Developer

Responsibilities:

  • Created data pipeline for different events of mobile applications, to filter and load consumer response data from urban-airship in AWS S3 bucket into Hive external tables in HDFS location.
  • Worked with different file formats like JSON, AVRO and parquet and compression techniques like snappy.
  • Developed UDF’s in spark to capture values of a key-value pair in encoded JSON string.
  • Developed SQL scripts for end user / analyst requirements for ad-hoc analysis
  • Used various Hive optimization techniques like partitioning, bucketing and Map join.
  • Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSON schema change of source files, and verifying duplicate files in source location.
  • Developed spark application for filtering JSON source data in AWS S3 location and store it into HDFs with partitions and used spark to extract schema of JSON files.

Environment: Hive, Spark 2.0, AWS S3, EMR, Jenkins, Shell scripting, HBASE, Airflow, Intellij IDEA, Sqoop, Hive, JAVA.

Confidential, Charlotte-NC

Hadoop Developer

Responsibilities:

  • Loaded home mortgage data from the existing DWH tables (Teradata) to HDFS using Sqoop.
  • Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
  • Wrote Impala/Hive Queries to have a consolidated view of the mortgage and retail data.
  • Created multiple Hive tables, implemented Dynamic Partitioning and Buckets in Hive for efficient data access.
  • Involved in creating Hive External tables, also used custom SerDe’sbased on the structure of input file so that Hive knows how to load the files to Hive tables.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MapReduce jobs.
  • Developed multiple modules in Scala for data cleaning and pre-processing jobs in Spark environment.
  • Ingested credit bureau data stream into the data lake via Flume

Environment: Hadoop, HDFS, Spark 1.6, Sqoop, Hive, PIG, Flume, Oozie, Zookeeper, Cloudera distribution (CDH-5.6.1), Impala, Eclipse.

Confidential, CA

Hadoop developer

Responsibilities:

  • Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
  • Installed Cloudera Manager on the clusters.
  • Used a 20-node cluster with Cloudera Hadoop distribution on Amazon EC2.
  • Developed ad-clicks based data analytics, for keyword analysis and insights.
  • Crawled public posts from Facebook and tweets.
  • Wrote MapReduce jobs to extract product sentiment and publish to Data Science team.
  • Converted output to structured data and imported to Spotfire with analytics team.
  • Defined problems to look for right data and analyze results to make room for new project.
  • TIBCO Spotfire with in-house custom application was used to perform and generate analytic

Environment: Hadoop, HBase, HDFS, MapReduce, Java, Spotfire, Cloudera Manager, Amazon EC2

Confidential

Java Developer

Responsibilities:

  • Individually worked on all the stages of a Software Development Life Cycle (SDLC).
  • Used JavaScript code, HTML and CSS style declarations to enrich websites.
  • Implemented the application using Spring MVC Framework which is based on MVC design pattern.
  • Designed User Interface and the business logic for customer registration and maintenance.
  • Integrating Web services and working with data in different servers.
  • Involved in designing and Development of SOA services using Web Services.
  • Experience in Creating Tables, Views, Triggers, Indexes, Constraints and functions in SQL Server2005.

Environment: Java, J2EE, JSP, Spring, Struts, Hibernate, Eclipse, SOA, WebLogic, Oracle, HTML, CSS, Web Services, JUnit, SVN, Windows, UNIX.

We'd love your feedback!