Hadoop & Spark Developer Resume Chicago, IL - Hire IT People

SUMMARY

Around 5 years of professional experience in Information Technology, which includes 3 years of experience in the development of Big Data and Hadoop Ecosystem applications and 2 years of extensive experience in JAVA Technologies, Database development and Data analytics.
Expertise in Hadoop components - Hive, Hue, Pig, Sqoop, HBase, Impala, Flume, Oozie and Apache Spark.
Experience in writing Pig Latin, HiveQL scripts and extended their functionality using User Defined Functions (UDF’s).
Hands on experience with performance optimization techniques in Hive, Impala, Spark.
Had a very good exposure working with various File-Formats (Parquet, Avro & JSON) and Compressions (Snappy & Gzip).
Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
Developed applications using Spark and Scala for data processing.
Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and actions.
Good knowledge on Spark architecture and real-time streaming using Spark.
Hands on experience spinning up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
Hands on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.
Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Collections, Data-structures and serialization.
Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
Experience in Java, JSP, Servlets, Web Logic, Web Sphere, Java Script, Ajax, JQuery, XML, and HTML.
Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server and MySQL.
Knowledge on ETL methods for data extraction, transformation and loading in corporate-wide ETL solutions and Data warehouse tools for reporting and data analysis.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Well-Versed with Agile/SCRUM and Waterfall methodologies.
Strong team player with good communication, analytical, presentation and inter-personal skills.

TECHNICAL SKILLS

Programming Languages: SQL, Java, J2EE, Scala and Linux shell scripting

Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop, Kafka, Oozie, Flume, Zookeeper, Spark, Cloudera and Hortonworks.

Databases & NoSQL: Oracle, Teradata, MySQL, SQL Server, DB2, Familiar with NoSQL- HBase

Scripting & Query Languages: Linux Shell scripting, SQL and PL/SQL.

Hadoop Paradigms: Map Reduce, YARN, In-memory computing, High Availability, and Real-time Streaming.

Other Tools: Eclipse, IntelliJ, SVN, GitHub, Jira, Kanban, BitBucket.

Cloud Components: AWS (S3 Buckets, EMR, Ec2, Cloud Formation), Azure (Sql Database & Data Factory)

PROFESSIONAL EXPERIENCE

Confidential, Chicago, IL

Hadoop & Spark Developer

Responsibilities:

Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
Developed Spark API to import data into HDFS from Teradata and created Hive tables.
Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL using Python and Scala.
Involved in performance tuning of Hive from design, storage and query perspectives.
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
Developed Spark core and Spark SQL scripts using Scala for faster data processing.
Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
Integrated Hive and Tableau Desktop reports and published to Tableau Server.
Developed shell scripts for running Hive scripts in Hive and Impala.
Orchestrated number of Sqoop, Hive scripts using Oozie workflow, and scheduled using Oozie coordinator.
Used Jira for bug tracking, BitBucket to check-in, and checkout code changes.

Environment : HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL Linux Shell Scripting, Cloudera.

Confidential, Grapevine, TX

Hadoop Developer

Responsibilities:

Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
Written MapReduce code to parse the data from various sources and storing parsed data into Hbase and Hive.
Worked on creating combiners, partitions, and distributed cache to improve the performance of MapReduce jobs.
Developed Shell Script to perform data profiling on the ingested data with the help of HIVE Bucketing.
Developed Hive UDF for performing Hashing mechanism on the Hive Column.
Experienced in writing Hive validation scripts that are used in validation framework (for daily analysis through graphs and presented to business users).
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
Developed code in Python to use MapReduce framework by Hadoop streaming.
Used Pig as ETL tool to do transformations, joins and some pre-aggregations before storing the data into HDFS.
Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Oracle.
Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
Experience in streaming log data using Flume and data analytics using Hive.
Developed a data pipeline using Kafka and Storm to store data into HDFS.
Extracted the data from RDBMS (Oracle, MySQL & Teradata) to HDFS using Sqoop.
Worked on a POC on Spark for comparing performance difference of existing MapReduce jobs written in hive and spark jobs execution times.

Environment: Hadoop, MapReduce, HDFS, Pig, HiveQL, Oozie, Flume, Impala, Cloudera, MySQL, UNIX Shell Scripting, Tableau, Python, Spark.

Confidential

Java/J2EE Developer

Responsibilities:

Involved in Full Life Cycle Development in Distributed Environment using Java and J2EE framework.
Designed the application by implementing Struts Framework based on MVC Architecture.
Designed and developed the front end using JSP, HTML and JavaScript and JQuery.
Implemented the Web Service client for the login authentication, credit reports and applicant information Apache Axis 2 Web Service.
Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
Developed framework for data processing using Design patterns, Java, XML.
Used the lightweight container of the Spring Framework to provide architectural flexibility for Inversion of Controller (IOC).
Used Hibernate ORM framework with spring framework for data persistence and transaction management.
Designed and developed Session beans to implement the Business logic.
Developed EJB components that are deployed on Web logic Application Server.
Written unit tests using JUnit Framework and Logging is done using Log4J Framework.
Designed and developed various configuration files for Hibernate mappings.
Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
Developed Web Services for sending and getting data from different applications using SOAP messages.
Actively involved in code reviews and bug fixing.
Applied CSS (Cascading style Sheets) for entire site for standardization of the site.
Assisted QA Team in defining and implementing a defect resolution process including defect priority, and severity.

Environment: Java 5.0, Struts, Spring 2.0, Hibernate 3.2, Web Logic 7.0, Eclipse 3.3, Oracle, JUnit 4.2, Maven, Windows XP, HTML, CSS, JavaScript, and XML.

We provide IT Staff Augmentation Services!

Hadoop & Spark Developer Resume

Chicago, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship