We provide IT Staff Augmentation Services!

Hadoop & Spark Developer Resume

Chicago, IL

SUMMARY

  • Around 5 years of professional experience in Information Technology, which includes 3 years of experience in the development of Big Data and Hadoop Ecosystem applications and 2 years of extensive experience in JAVA Technologies, Database development and Data analytics.
  • Expertise in Hadoop components - Hive, Hue, Pig, Sqoop, HBase, Impala, Flume, Oozie and Apache Spark.
  • Experience in writing Pig Latin, HiveQL scripts and extended their functionality using User Defined Functions (UDF’s).
  • Hands on experience with performance optimization techniques in Hive, Impala, Spark.
  • Had a very good exposure working with various File-Formats (Parquet, Avro & JSON) and Compressions (Snappy & Gzip).
  • Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
  • Developed applications using Spark and Scala for data processing.
  • Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and actions.
  • Good knowledge on Spark architecture and real-time streaming using Spark.
  • Hands on experience spinning up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
  • Hands on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.
  • Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Collections, Data-structures and serialization.
  • Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Experience in Java, JSP, Servlets, Web Logic, Web Sphere, Java Script, Ajax, JQuery, XML, and HTML.
  • Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server and MySQL.
  • Knowledge on ETL methods for data extraction, transformation and loading in corporate-wide ETL solutions and Data warehouse tools for reporting and data analysis.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Well-Versed with Agile/SCRUM and Waterfall methodologies.
  • Strong team player with good communication, analytical, presentation and inter-personal skills.

TECHNICAL SKILLS

Programming Languages: SQL, Java, J2EE, Scala and Linux shell scripting

Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop, Kafka, Oozie, Flume, Zookeeper, Spark, Cloudera and Hortonworks.

Databases & NoSQL: Oracle, Teradata, MySQL, SQL Server, DB2, Familiar with NoSQL- HBase

Scripting & Query Languages: Linux Shell scripting, SQL and PL/SQL.

Hadoop Paradigms: Map Reduce, YARN, In-memory computing, High Availability, and Real-time Streaming.

Other Tools: Eclipse, IntelliJ, SVN, GitHub, Jira, Kanban, BitBucket.

Cloud Components: AWS (S3 Buckets, EMR, Ec2, Cloud Formation), Azure (Sql Database & Data Factory)

PROFESSIONAL EXPERIENCE

Confidential, Chicago, IL

Hadoop & Spark Developer

Responsibilities:

  • Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Developed Spark API to import data into HDFS from Teradata and created Hive tables.
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL using Python and Scala.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Integrated Hive and Tableau Desktop reports and published to Tableau Server.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Orchestrated number of Sqoop, Hive scripts using Oozie workflow, and scheduled using Oozie coordinator.
  • Used Jira for bug tracking, BitBucket to check-in, and checkout code changes.

Environment : HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL Linux Shell Scripting, Cloudera.

Confidential, Grapevine, TX

Hadoop Developer

Responsibilities:

  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
  • Written MapReduce code to parse the data from various sources and storing parsed data into Hbase and Hive.
  • Worked on creating combiners, partitions, and distributed cache to improve the performance of MapReduce jobs.
  • Developed Shell Script to perform data profiling on the ingested data with the help of HIVE Bucketing.
  • Developed Hive UDF for performing Hashing mechanism on the Hive Column.
  • Experienced in writing Hive validation scripts that are used in validation framework (for daily analysis through graphs and presented to business users).
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
  • Developed code in Python to use MapReduce framework by Hadoop streaming.
  • Used Pig as ETL tool to do transformations, joins and some pre-aggregations before storing the data into HDFS.
  • Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Oracle.
  • Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
  • Experience in streaming log data using Flume and data analytics using Hive.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Extracted the data from RDBMS (Oracle, MySQL & Teradata) to HDFS using Sqoop.
  • Worked on a POC on Spark for comparing performance difference of existing MapReduce jobs written in hive and spark jobs execution times.

Environment: Hadoop, MapReduce, HDFS, Pig, HiveQL, Oozie, Flume, Impala, Cloudera, MySQL, UNIX Shell Scripting, Tableau, Python, Spark.

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in Full Life Cycle Development in Distributed Environment using Java and J2EE framework.
  • Designed the application by implementing Struts Framework based on MVC Architecture.
  • Designed and developed the front end using JSP, HTML and JavaScript and JQuery.
  • Implemented the Web Service client for the login authentication, credit reports and applicant information Apache Axis 2 Web Service.
  • Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
  • Developed framework for data processing using Design patterns, Java, XML.
  • Used the lightweight container of the Spring Framework to provide architectural flexibility for Inversion of Controller (IOC).
  • Used Hibernate ORM framework with spring framework for data persistence and transaction management.
  • Designed and developed Session beans to implement the Business logic.
  • Developed EJB components that are deployed on Web logic Application Server.
  • Written unit tests using JUnit Framework and Logging is done using Log4J Framework.
  • Designed and developed various configuration files for Hibernate mappings.
  • Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
  • Developed Web Services for sending and getting data from different applications using SOAP messages.
  • Actively involved in code reviews and bug fixing.
  • Applied CSS (Cascading style Sheets) for entire site for standardization of the site.
  • Assisted QA Team in defining and implementing a defect resolution process including defect priority, and severity.

Environment: Java 5.0, Struts, Spring 2.0, Hibernate 3.2, Web Logic 7.0, Eclipse 3.3, Oracle, JUnit 4.2, Maven, Windows XP, HTML, CSS, JavaScript, and XML.

Hire Now