Hadoop Developer Resume
OBJECTIVE:
Seeking position as Hadoop Developer which enables use of exceptional knowledge in Java, Hive, SQL and Spark to collaborate on program development.
PROFESSIONAL SUMMARY:
- 3+ years of experience as Software Engineer in developing applications using Big Data, Java/J2EE and SQL.
- 3+ years of experience with Big Data Hadoop Ecosystem tools like Map Reduce, YARN, HDFS, HBase, Impala, Hive, Pig, Oozie, Airflow, Apache Spark, for storage, querying and analysis of data.
- Hands on experience with Cloudera and Apache Hadoop distributions.
- Performance Improvements in Hive & Impala using multiple methods but not limited to dynamic partitioning, bucketing, file compressions, vectorization, and cost based optimization, etc.
- Hands on experience handling different file formats like Json, AVRO and Parquet.
- Used Pig as an ETL tool for transformations, joins, filter and developed pig UDF’s when needed.
- Developed UDFs & UDAFs in Java as and when necessary to use in HIVE queries.
- Integrated Hive, Impala & Spark with Tableau using ODBC/Hiveserver2 for data visualizations.
- Worked on analyzing data in NOSQL databases like Cassandra.
- Developed applications using pyspark for data filtering creating Spark RDD’s, Data Frames, caching.
- Developed pyspark UDF’s in python for Json data filtering.
- Hands on experience with python programming and different libraries like boto for AWS S3 access.
- Good knowledge and hands - on with Scala on spark.
- Hands-on experience with AWS (Amazon Web Services), Elastic MapReduce (EMR), storage S3, EC2 instances and data warehousing Redshift.
- Hands-on programming experience in various technologies like JAVA, and Scala. .
- Having Experience on UNIX commands, setting up CRON jobs and Deployment of Applications in Server.
- Architected, Designed and maintained high performing ETL Processes.
- Proficient in using RDMS concepts like Views, Joins, Order by, Group By with Oracle, SQL Server and MySQL.
- Experience in software configuration management using GIT.
- Proficient in SDLC methodologies such as Agile Scrum and Waterfall operated under CMMI guidelines.
- Experience managing and leading developer’s team to deliver end to end projects.
TECHNICAL SKILLS:
Hadoop Components: HDFS, Hue, MapReduce, PIG, Hive, HCatalog, Hbase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Airflow and Cloudera Manager.
Spark Components: Apache Spark, Data Frames, SparkSQL, Spark, YARN, Pair RDDs.
Server Side Scripting: UNIX Shell Scripting.
Databases: Confidential SQL Server, MySQL.
Programming Languages: Java, Scala.
Web Servers: Windows server 2005/2008 and Apache Tomcat.
IDE: Eclipse, IntelliJ IDEA.
OS/Platforms: Windows 2005/2008, Linux (All major distributions), Unix.
NoSQL Databases: Hbase, Cassandra.
Currently Exploring: Apache Flink, Drill, Tachyon.
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop Developer
Responsibilities:
- Created data pipeline for different events of Confidential ’s applications, to filter and load consumer response data from urban-airship in AWS S3 bucket into Hive external tables in HDFS location.
- Involved in design of the data pipeline for Confidential ’s consumer data from different vendors.
- Worked with different file formats like Json, AVRO and parquet and compression techniques like snappy.
- Used impala for adhoc queries that required low latency and quick output.
- Used various Hive optimization techniques and solved various productions issues for failed jobs.
- Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
- Developed shell scripts for dynamic partitions adding to hive stage table, verifying Json schema change of source files, and verifying duplicate files in source location.
- Developed python UDF’s for Spark to capture values of a particular key-value in encoded Json string.
- Developed pyspark application for filtering Json source data in AWS S3 location and store it into HDFs with partitions and also used spark to extract schema of Json files.
- Checking airflow logs for job failure error messages and resolving them.
- Used Bitbucket as code repository and Jenkins as continuous integration tool.
- Used AWS EMR and cloudera cluster depending on the use case and volume of data.
- Participate in daily scrum meetings for discussion on project progress.
- Developed new hire documentations for production support and process workflow.
Environment: Hive, Spark, Python, AWS S3, EMR, Cloudera, Jenkins, Bit bucket, SQOOP, Shell scripting, Cassandra, Airflow, pycharm, SQOOP, Impala.
Teaching Assistant
Confidential
Responsibilities:
- Teach students about different software architectures being used for developing a software system.
- Help students with assignments and projects related to the subject.
- Take classes for students as and when necessary for students.
Confidential
Java DeveloperResponsibilities:
- Wrote Hive and impala queries to filter the data based on defects of the vehicle.
- Designing and creating Hive external tables, Views using shared meta-store with Partitioning, Dynamic Partitioning and buckets.
- Developed HIVE queries to filter the data and load into final tables for each manufacturer.
- Used Hive optimization techniques like Mapside Joins, merging, Parallel execution and stored the data in HDFs in Parquet file format.
- Evaluated performance of spark, impala and hive for different queries.
- Explored different use cases where spark can be used instead of hive to load data.
Confidential
Java DeveloperResponsibilit ies:
- Developed multiple MapReduce Jobs in java for data cleaning and pre-processing.
- Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Sqoop.
- Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Wrote Hive Queries to have a consolidated view of the mortgage and retail data.
- Created multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
- Involved in creating Hive External tables, loading with data and writing hive queries which will run internally in map reduce, also used custom SerDe’s based on the structure of input file so that Hive knows how to load the files to Hive tables.
- Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI Team.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Experience in using Tableau Data Integration tool for data integration, OLAP analysis and ETL process
Environment: Hadoop, HDFS, MapReduce, Sqoop, Hive, PIG, Flume, Oozie, Zoo keeper, Cloudera distribution, MySQL, Eclipse.
Confidential
Java Developer
Responsibilities:
- Individually worked on all the stages of a Software Development Life Cycle (SDLC).
- Used JavaScript code, HTML and CSS style declarations to enrich websites.
- Implemented the application using Spring MVC Framework which is based on MVC design pattern.
- Designed User Interface and the business logic for customer registration and maintenance.
- Integrating Web services and working with data in different servers.
- Involved in designing and Development of SOA services using Web Services.
- Understanding the requirements from business users and end users.
- Experience creating UML class and sequence diagram.
- Experience in Creating Tables, Views, Triggers, Indexes, Constraints and functions in SQL Server2005.
- Worked in content management for versioning and notifications.
Environment: Java, J2EE, JSP, Spring, Struts, Hibernate, Eclipse, SOA, WebLogic, Oracle, HTML, CSS, Web Services, JUnit, SVN, Windows, UNIX.