We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Nashville, TN

SUMMARY:

  • Overall 10+ years of professional experience in IT in Analysis, Design, Development, Testing, Documentation, Deployment, Integration, and Maintenance of web based and Client/Server applications using SQL and Big Data technologies.
  • I have experience in Application Development using Hadoop and related Big Data technologies such as HBASE, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
  • In - depth Knowledge of Data Structures, Design and Analysis of Algorithms and good understanding of Data Mining and Machine Learning techniques.
  • Excellent knowledge on Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Having Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Proficient in design and development of Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, MapReduce, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig and Flume.
  • Skilled in writing Map Reduce jobs in Pig and Hive.
  • Knowledge in managing and reviewing Hadoop Log files.
  • Expertise in wide array of tools in the Big Data Stack such as Hadoop, Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Knowledge of streaming the Data to HDFS using Flume.
  • Excellent programming skills with experience in Java, C, SQL and Python Programming
  • In depth and extensive knowledge of analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Experience in tuning and troubleshooting performance issues in Hadoop cluster.
  • Worked on importing data into HBase using HBase Shell and HBase Client API.
  • Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Extensive experience working on various databases and database script development using SQL and PL/SQL
  • Hands on experience in application development using Java, RDBMS and Linux Shell Scripting.
  • Experience in writing Pig and Hive scripts and extending Hive and Pig core functionality by writing custom UDFs.
  • Knowledge in writing live Real-time Processing using Spark Streaming with Kafka.
  • Involved in HBase setup and storing data into HBase, which will be used for further analysis.
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SeDre with JSON and Avro.
  • Supported Map Reduce Programs running on the cluster and wrote custom Map Reduce Scripts for Data Processing in Java.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Hbase, Oozie, Zookeeper, Apache Kafka, Cassandra, StreamSets, Impyla, Solr

Programming Languages: Java, (JDK 5/JDK 6), C, HTML, SQL, PL/SQL, Python

Client Technologies: JQUERY, Java Script, AJAX, CSS, HTML 5, XHTML, D3, Angular JS

Operating Systems: UNIX, WINDOWS, LINUX

Application Servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere

Web technologies: JSP, Servlets, JNDI, JDBC, Java Beans, JavaScript, Web Services (JAX-WS)

Databases: Oracle 8i/9i/10g & MySQL 4.x/5.x

Java IDE: Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0

PROFESSIONAL EXPERIENCE:

Sr. Hadoop Developer

Confidential - Nashville, TN

Responsibilities:

  • Developed Pyspark code to read data from Hive, group the fields and generate XML files.
  • Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
  • Implemented REST call to submit the generated CDAs to vendor website Implemented Impyla to support JDBC/ODBC connections for Hiveserver2
  • Enhanced the Pyspark code to replace spark with Impyla. Performed installation for Impyla on the Edge node
  • Evaluated performance of Spark application by testing on cluster deployment mode vs local mode
  • Experimented submissions with Test OIDs to the vendor website
  • Explored StreamSet Data collector Implemented StreamSets data collector tool for ingestion into Hadoop.
  • Created a StreamSet pipeline to parse the file in XML format and convert to a format that is fed to Solr
  • Built a data validation dashboard in Solr to be able to display the message record. Wrote shell script to run Sqoop job for bulk data ingestion from Oracle into Hive
  • Created tables for the ingested data in Hive. Scheduled Oozie job for data ingestion for the Sqoop job
  • Worked with JSON file format for StreamSets. Worked with Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
  • Shell scripts to dump the data from MySQL to HDFS.
  • Analyzing of large volumes of structured data using SparkSQL.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Worked on Maven 3.3.9 for building and managing Java based projects. Hands-on experience with using Linux and HDFS shell commands. Worked on Kafka for message queuing solutions.
  • Developing Unit Test Cases for Mapper, Reducer and Driver classes using MRUNIT.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Written HBASE Client program in Java and web services.

Environment: Sqoop, StreamSets, Impyla, Pyspark, Solr, Oozie, Hive, Impala

Sr. Hadoop Developer

Confidential - Franklin, TN

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written multiple Map Reduce programs in Java for Data Analysis.
  • Wrote Map Reduce job using Pig Latin and Java API.
  • Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
  • Developed pig scripts for analyzing large data sets in the HDFS.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Designed and presented plan for POC on impala.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
  • Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Implemented Daily jobs that automate parallel tasks of loading the data into HDFS using autosys and Oozie coordinator jobs.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Responsible for performing extensive data validation using Hive.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Used Kafka to load data in to HDFS and move data into NoSQL databases(Cassandra)
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in submitting and tracking Map Reduce jobs using Job Tracker.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Responsible for cleansing the data from source systems using Ab Initio components such as Join, Dedup Sorted, De normalize, Normalize, Reformat, Filter-by-Expression, Rollup.
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Teradata, Zookeeper, autosys, Hbase, Cassandra, Apache ignite

Hadoop Developer

Confidential - Brentwood, TN

Responsibilities:

  • Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive. Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Analyze large and critical datasets using Cloudera, HDFS, Hbase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, & Spark.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying. Used Pig to store the data into HBase.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Used Pig to parse the data and Store in Avro format.
  • Stored the data in tabular formats using Hive tables and Hive SerDes.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Implemented a script to transmit information from Oracle to Hbase using Sqoop.
  • Worked on tuning the performance Pig queries.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

We'd love your feedback!