We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Austin, TX

SUMMARY

  • Technically accomplished professional with over 8 years of experience in Information technology and Enterprise Application Development in multiple industries, which includes over 4 years of hands on experience in Big Data/Hadoop ecosystem and related technologies.
  • Experience in Big Data Analytics with hands on experience in Data Extraction, Transformation, Loading and Data Analysis, Data Visualization using Cloudera Platform Map Reduce, HDFS, Hive, Pig, Sqoop, Flume, Hbase, Oozie, Yarn, Impala, Spark, Scala, Kafka, Ambari, Cassandra.
  • Experience working with Cloudera & Hortonworks Distribution of Hadoop.
  • Substantial experience writing MapReduce jobs in Java, PIG, Flume, Tez, Zookeeper, Hive and Storm.
  • Experience in web - based languages such as HTML, CSS, PHP, XML and other web methodologies including Web Services and SOAP.
  • Good Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Worked on Multi Clustered environment and setting up Cloudera Hadoopecho System.
  • Experience in transferring Streaming data from different data sources into HDFS and HBase using Apache Flume.
  • Involved in creating Hive tables, loading with data and writing Hive Adhoc queries that will run internally in MapReduce and TEZ.
  • Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
  • Experienced Spark Framework on both batch and real-time data processing.
  • Experience in writing Spark programs in Scala for Data Extraction, Transformation and Aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Expertise in writing SPARK Jobs in Scala for processing large sets of structured, semi-structured and store them in HDFS.
  • Experience in converting SQL queries into Spark Transformations using Spark RDDs and Scala and Performed map-side joins on RDD's.
  • Experience in creating Real-Time Data streaming solutions using Apache Spark Streaming.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Collected the JSON data from HTTP Source and developed Spark APIs that helps to do insert and updates in Hive tables.
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Experienced with batch processing of data sources using Apache Spark, Elastic Search.
  • Hands on experience working on NoSQL, MongoDB databases including Hbase, Cassandra and its integration with Hadoopcluster.
  • Hands on experience in developing Hadoopcluster on Public and Private Cloud Environment like Amazon AWS, OpenStack.
  • Experience developing Scala applications for Loading/Streaming data from NoSQL databases (HBASE) and into HDFS.
  • Good experience in working with cloud environment like Amazon Web Services () EC2 and S3.
  • Hands on experience on working with Amazon EMR framework transferring data to EC2 server.
  • Schedule Map Reduce Jobs -FIFO and FAIR share.

TECHNICAL SKILLS

Bigdata Ecosystem: Hadoop 1.x/2.x(Yarn), HDFS, Map Reduce, Mongo, HBase, Hive, PIG, Zookeeper, Sqoop, Oozie, Flume, Storm, HDP, Eclipse, Cloudera-desktop

Java/J2EETechnologies: J2EE, Servlets, JSP, JDBC, AJAX, SOAP, WSDL

SDLC Methodologies: Agile, UML, Waterfall

Programming Languages: C, C++, Java, Python, Shell Scripting, Scala, SQL and PLSQL

Web Technologies: HTML, DHTML, XML, XSLT, JavaScript, CSS

NoSQL Databases: MongoDB, HBase, Cassandra

IDE Tools: Eclipse, NetBeans, WinSCP

Operating Systems: Windows Family, RHEL, Ubuntu, CentOS, Windows Server OS - 2003, 2012, R2.

Scripting Languages: Perl, shell, Ruby, Python, C, SQL, Java Script, HTML

Databases: Oracle DB2, MS - SQL Server, MySQL, MS Access, Mongo DB

Web Services: WebLogic, Web Sphere

Build Tools: Putty, Nagios

PROFESSIONAL EXPERIENCE

Confidential, Austin, TX

Sr. Hadoop Developer

Responsibilities:

  • Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Developed MapReduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Installing and maintaining theHadoop - Spark cluster from the scratch in a plain Linux environment anddefining the code outputs as PMML.
  • Experience in integrating Cassandra with Elastic Search andHadoop.
  • Load and transform large sets of structured, semi structured and unstructured data even joins and some pre-aggregations before storing data into HDFS.
  • Extracted the data from Teradata into HDFS/Databases/Dashboards using SPARK STREAMING.
  • Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Dataframes API to load structured and semi structured data into Spark Clusters.
  • Migrated the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like RedShift, Dynamo DB
  • Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
  • Hands on expertise in running the SPARK & SPARK SQL on AMAZON ELASTIC MAPREDUCE (EMR).
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
  • Designed application which receives data from several source systems and ingest to PostgreSQL database.
  • Automated all the jobs, for pulling netflow data from relational databases to load data into Hive tables, using Oozie workflows and enabled email alerts on any failure cases
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Setup Hadoop cluster on Amazon EC2 for POC.

Environment: Hadoop, Map Reduce, Spark, Spark SQL, Kafka, Storm, HDFS (Yarn), Hive, Pig, Sqoop, Oozie, Java, SQL, Shell script, Talend, Scala, Mongo DB, Dynamo DB, Amazon EC2 Server.

Confidential, Texas

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions usingHadoop components.
  • Used Apache Maven to build and configure the application for the MapReduce jobs.
  • Developed a custom File System plug in for Hadoop so it can access files on Data Platform and which allows HadoopMapReduce programs, HBase, Pig and Hive to have an access to files directly.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Responsible for building scalable distributed data solutions using Hadoopand Developing Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Experience in NoSQL data stores Hbase, Cassandra and Mongo DB.
  • Extracted and loaded data into Data Lake environment (AmazonS3) by using Sqoop which was accessed by business users and data scientists.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Handled high volumes of data where a group of transactions is collected over a period using Batch data processing.
  • Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
  • Loaded data into HDFS and extracted the data from Oracle into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Installed and configured Cloudera Manager for easy management of existing Hadoopcluster
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
  • Responsible for managing and reviewingHadooplog files. Designed and developed data management using MySQL
  • Written Python scripts to parse XML documents and load the data in database
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements
  • Written the shell scripts to monitor the health check of Hadoopdaemon services and respond accordingly to any warning or failure conditions
  • Performed various optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.

Environment: HDFS, Hive, PIG, UNIX, SQL, Java MapReduce, Hbase, Sqoop, Oozie, Linux, Data Pipeline, ClouderaHadoopDistribution, Python, MySQL, Git, MapR-DB.

Confidential, Chicago, IL

Hadoop Engineer

Responsibilities:

  • Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
  • Worked on analyzingHadoopcluster using different big data analytic tools including Pig, Hive, and MapReduce.
  • Continuous monitoring and managing theHadoopcluster using Cloudera Manager.
  • Used Pig to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into Hbase.
  • Developed data pipeline using Flume, Sqoop to ingest behavioral data into HDFS for analysis.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
  • Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data.
  • Extracted the needed data from the server into HDFS and bulk loaded the cleaned data into HBase.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Involved in running MapReduce jobs for processing millions of records.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed Hive queries and Pig scripts to analyze large datasets.
  • Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.
  • Involved in generating the Adhoc reports using Pig and Hive queries.
  • Used Hive to analyze data ingested into Hbase by using Hive-Hbase integration and compute various metrics for reporting on the dashboard.
  • Provide operational support forHadoopand/or MySQL databases.
  • Loaded the aggregated data onto Oracle fromHadoopenvironment using Sqoop for reporting on the dashboard.

Environment: Apache Hadoop, Cloudera, Hive, HBase, Pig, Sqoop, Zookeeper, Java, Oozie, UNIX Shell Scripting, MapReduce (MRV1).

Confidential

Java Developer

Responsibilities:

  • Developed JMS API using J2EE package.
  • Made use of Java script for client side validation.
  • Used Struts Framework for implementing the MVC Architecture.
  • Wrote various Struts action classes to implement the business logic.
  • Involved in the design of the project using UML Use Case Diagrams, Sequence Diagrams, Object diagrams, and Class Diagrams.
  • Understand concepts related to and written code for advanced topics such as Java IO, serialization and multithreading.
  • Written the Java Script, HTML, DHTML, CSS, Servlets, and JSP for designing GUI of the application.
  • Used DISPLAY TAGS in the presentation layer for better look and feel of the web pages.
  • Used JMS API for asynchronous communication by putting the messages in the Message queue
  • Developed Packages to validate data from Flat Files and insert into various tables in Oracle Database.
  • Provided UNIX scripting to drive automatic generation of static web pages with dynamic news content.
  • Participated in requirements analysis to figure out various inputs correlated with their scenarios in Asset Liability Management (ALM).
  • Assisted design and development teams in identifying DB objects and their associated fields in creating forms for ALM modules.
  • Involved in developing PL/SQL Procedures, Functions, Triggers and Packages to provide backend security and data consistency.
  • Involved in interacting with the Business Analyst and Architect during the Sprint Planning Sessions.
  • Responsible for performing Code Reviewing and Debugging.

Confidential

Java Developer

Responsibilities:

  • Provide L3 application support as primary on call.
  • Involved in the development of Report Generation module which includes volume statistics report, Sanctions Monitoring Metrics report, and TPS report.
  • Implemented Online List Management (OLM) and FMM module using spring and Hibernate.
  • Wrote various SQL, PL/SQL queries and stored procedures for data retrieval.
  • Created Configuration files for the application using Spring framework.
  • Used HTML, CSS, XML and JavaScript to design a page.
  • Successfully migrated legacy application written in VB6.0 to VB.Net. Used Spring Framework for Dependency Injection and integrated with Hibernate.
  • Developed Web Services to get data from the external system in terms of .txt file to load into the database.
  • Extensively used the JDBC Prepared Statement to embed the SQL queries into the java code.
  • Developed DAO (Data Access Objects) using Spring Framework 3.
  • Developed Web applications with Rich Internet applications using Java applets, Silverlight, Java.
  • Deployed this application, which uses J2EE architecture model and Struts Framework first on WebLogic and helped in migrating to JBoss Application server.
  • Developed DTS/SSIS packages to load employee details into row Money tables of the SQL server for further processing.
  • Worked on tuning of back-end stored procedures using TOAD.

Environment: Core Java, J2EE, JSP, Servlets, JQuery, JavaScript, CSS, HTML, SQL, VB, .Net.

We'd love your feedback!