Hadoop Developer Resume
Charlotte, NC
PROFESSIONAL SUMMARY:
- 8+ years of IT experience in software development, big data management, data modeling, data integration, implementation and testing of enterprise class systems spanning big data frameworks, advanced analytics and Java/J2EE technologies.
- 3+ years of hands on experience in Hadoop components & Map Reduce programming for parsing and populating tables for Terabytes of data.
- Extensive usage of Sqoop, Flume, Oozie for data ingestion into HDFS & Hive warehouse.
- Hands on performance improvement techniques for data processing in Hive, Impala, Spark, Pig & map - reduce using methods including but not limited to dynamic partitioning, bucketing, file compression.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and kafka.
- Expertise of ingesting data to Solr from HBase
- Experienced in importing data from different sources using StreamSets.
- Experience with Cloudera, Hortonworks & MapR Hadoop distributions.
- Hands on experience with Spark-SQL for various business use-cases.
- Used Spark-SQL, Scala APIs for querying & transformation of data residing in Hive.
- Used python for Spark SQL jobs to fast process the data.
- Replaced existing MR jobs with Spark streaming & Spark data transformations for efficient data processing.
CORE COMPETENCIES:
- Hadoop Development & Troubleshooting
- Data Analysis
- Data Visualization & Reporting in Tableau
- Real - time Streaming using Spark.
- Map Reduce Programming
- Performance Tuning of Hive & Impala
- Ingesting data from HBase to Solr
- Data import using StreamSets.
TECHINICAL SKILLS:
Hadoop Ecosystems: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Spark and Zookeeper, Solr, StreamSets.
Apache Spark: Spark, Spark SQL, Spark Streaming, Scala.
ETL Tools: Informatica with Hadoop connector, Pentaho, Alteryx
Scripting Languages: Java, C, Scala, SQL, Unix Shell Scripting, Python
Java Technologies: JQuery, JSP, Servlets.
SQL Databases: Oracle, SQL Server 2012, SQL Server 2008 R2, DB2, Teradata
No-SQL: MongoDB, HBase.
Development tools: Maven, Eclipse, IntelliJ, PyCharm
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Created external hive tables to move the data from different source to cloudera.
- Used to keep track of the data once it gets loaded and updated on Weekly and daily basis
- Performed SQL Joins among Hive tables to make it into one table.
- Ingested data from Hive to HBase and HBase to Solr using Spark.
- Worked on Ingesting the data using StreamSets from different sources like JDBC to Hive by Sqoop jobs.
- Data import from Hive to Solr using StreamSets.
- Near-Real time indexing into Solr for automated process after scheduling the job.
- Worked on POC to pull over the third-party data and used Spark SQL to create schema RDD and loaded it into Hive Tables and structured data using Spark SQL.
- Developed Flume ETL job for handling data from HTTP source and sink as HDFS.
- Worked closely with Admin to setup with Kerberos Authentication.
- Interacted closely with Web Developer for usage of application and to pull data from Solr as well as Hbase and populate it in front end.
- Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
- Implemented Spark SQL queries using Python for fast processing the data.
Environment: Hive, HDFS, HBase, Solr, StreamSets, Spark, Kafka, Scala, IntelliJ, Python, PyCharm.
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Developed data pipeline using Sqoop, Flume to store data into HDFS and further processing through spark.
- Creating Hive tables with periodic backups, writing complex Hive/Impala queries to run on Impala.
- Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques with optimizations.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Experience in customizing map reduce framework at different levels like input formats, data types, custom serde and partitioners.
- Pushed the data to Windows mount location for Tableau to import it for reporting.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Implemented Spark to migrate map reduce jobs into Spark RDD transformations, streaming data using Spark streaming
- Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
- Configured build scripts for multi module projects with Maven.
- Automated the process of scheduling workflow using Oozie and Autosys.
Environment: Hadoop, Cloudera, HDFS, Hive, Spark, Sqoop, Flume, Java, Scala, Shell-script, Impala, Eclipse, Tableau, MySql.
Confidential
Java Developer
Responsibilities:
- Involved in requirement Analysis, Designing, Coding and Testing.
- Developed application on Agile scrum basis.
- Developed and implemented the MVC Architectural pattern using Struts Framework including JSP, Servlets, EJB and Action classes.
- Object Oriented Analysis and Design using UML include development of class diagrams, Sequence diagrams and State diagrams and implemented these diagrams in Microsoft Visio.
- Involved in writing client-side validations using JavaScript, CSS.
- Designed and developed the UI using Struts view components HTML, CSS and JavaScript.
- Developed JMS API using J2EE package.
- Used Oracle as Database and used Toad for queries execution and involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Involved in designing test plans, test cases and overall Unit testing of the system.
- Prepared documentation and participated in preparing user's manual for the application.
Environment: Java, JQuery, Junit, Servlets, Spring 2.0, Web Logic, Eclipse, JSP, Windows XP, HTML, CSS, JavaScript, and XML.