We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Jessup, PA

SUMMARY:

  • Over 7+ years of extensive hands on experience with Hadoop Ecosystem stack including HDFS,
  • MapReduce, Sqoop, Hive, Pig, HBase, Oozie, Flume, Kafka, Zookeeper, and Spark.
  • Experience in different Hadoop distributions like Cloudera and Horton Works Distributions (HDP).
  • Comfortable working with various facets of the Hadoop ecosystem, real - time or batch, Structured or Unstructured data processing.
  • Experience with NoSQL databases like HBase as well as other ecosystems like Zookeeper, Oozie, Impala, Storm, Spark- Streaming/SQL, Kafka, Flume.
  • Expertise skills in handling analytics projects using Big Data technologies.
  • Hands on experience in ingesting data from external servers to Hadoop.
  • Experience in moving large amounts of log, streaming event data and Transactional data using Flume.
  • Hands on experience developing workflows that execute Sqoop, Pig, Hive and Shell scripts using Oozie.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Good experience with Hive Data Warehousing concepts like Static/Dynamic Partitioning, Bucketing, Managed, and External Tables, join operations on tables.
  • Proficient in building user defined functions (UDF) in Hive and Pig, to analyze data and extended HiveQL and Pig Latin Functionality.
  • Experience in working with Spark transformations and actions on RDDs and Spark-SQL, Data Frames in Python.
  • Experience in implementing unified data ingestion platform using Kafka producers and consumers.
  • Experience in implementing near real-time event processing and analytics using Spark Streaming
  • Proficient with Flume topologies for data ingestion from streaming sources into Hadoop.
  • Well versed with major Hadoop distributions: Cloudera and HortonWorks
  • Having experience on Eclipse, NetBeans IDEs.
  • Ability to adapt to evolving Technology, Strong sense of responsibility and Accomplishment.
  • Has very good development experience with Agile Methodology.
  • Strong experience in distinct phases of Software Development Life cycle (SDLC) including Planning, Design, Development and Testing during the development of software applications.
  • Excellent leadership, interpersonal, problem solving and time management skills.
  • Excellent communication skills both written (documentation) and verbal (presentation).
  • Very responsible and good team player. Can work independently with minimal supervision

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential - Jessup, PA

Responsibilities:

  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters with agile methodology.
  • Monitored multiple Hadoop clusters environments using Ganglia, monitored workload, job performance and capacity planning using Cloudera Manager.
  • Experienced with through hands - on experience in all Hadoop, Java, SQL and Python.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Participated in functional reviews, test specifications and documentation review
  • Performed MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by Business Intelligence tools.
  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Documented the systems processes and procedures for future references, responsible to manage data coming from different sources.

Environment: Hadoop, HDFS, Map Reduce, Flume, Pig, Sqoop, Hive, Pig, Sqoop, Oozie, Ganglia, HBase, Shell Scripting.

Hadoop Developer

Confidential - Pittsburgh, PA

Responsibilities:

  • Worked on Spark SQL to handle structured data in Hive.
  • Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Worked on complex MapReduce program to analyses data that exists on the cluster.
  • Analysed substantial data sets by running Hive queries and Pig scripts.
  • Written Hive UDFs to sort Structure fields and return complex data type.
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
  • Creating files and tuned the SQL queries in Hive utilizing HUE.
  • Involved in collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
  • Created the Hive external tables using Accumulo connector.
  • Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
  • Created custom SOLR Query segments to optimize ideal search matching.
  • Developed Spark scripts by using Python shell commands.
  • Stored the processed results In Data Warehouse, and maintaining data using Hive.
  • Worked with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
  • Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Worked with NoSQL databases like MongoDB in making MongoDB tables to load expansive arrangements of semi structured data.
  • Developed Spark scripts by using Python shell commands as per the requirement.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, and EMR.

Environment: HDFS, MapReduce, Storm, Hive, Pig, Sqoop, MongoDB, Apache Spark, Python, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, SOLR, GIT, Maven.

Hadoop Consultant

Confidential - Phoenix, AZ

Responsibilities:

  • Automation of data pulls into HDFS from MySQL server and Oracle DB using Sqoop.
  • Analyzing source data tables for best possible loading strategies.
  • Involved in various stages of this project like planning, estimation the hardware and software, installing (SDLC).
  • Develop Shell scripts to perform various ETL jobs like creating staging and final tables.
  • Implemented 2 level staging process for Data Validation.
  • Extracted data from staging tables and analyzed data using Impala.
  • Implement ad - hoc queries using Impala, create tables with partitioning and bucketing to load data.
  • Created a Spark application to process and stream data from Kafka to MySQL.
  • Implement Hive Incremental updates using four-step strategy to load incremental data from RDBM systems.
  • Implement, configure optimization techniques like Bucketing, Partitioning and File Formats.
  • Used Spark to analyze data in HIVE, HBase and HDFS.
  • Involved in Hadoop Cluster Administration that includes adding and removing Cluster Nodes, Cluster Capacity Planning, and Performance Tuning.
  • Worked on Hadoop clusters capacity Planning and Management.
  • Monitoring and Debugging Hadoop jobs Applications running in production.
  • Written a PIG Scripts to read data from HDFS and write into Hive Table.
  • Experience of performance tuning Hive ETL Scripts, Pig Scripts, MR Jobs in production environment by altering job parameters.
  • Providing various hourly/weekly/monthly aggregation reports required by clients through Spark.
  • Worked on data processing part mainly to make the Unstructured Data to Semi-Structured Data and loaded into Hive tables, HBase tables and integration.
  • Load log data into HDFS using Flume.
  • Written the Apache PIG scripts to process the HDFS Data.
  • Developed Spark SQL scripts with Python for analysis and Demo purposes.

Environment: MapReduce, Spark, HDFS, Pig, HBase, Oozie, Zookeeper, Sqoop, Linux, Kafka, Hadoop, Maven, NoSQL, MySQL, Hive, Java, Eclipse, Python.

Hadoop Developer

Confidential

Responsibilities:

  • Worked on Hadoop Ecosystem using different big data analytic tools including Hive, Pig.
  • Involved in loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Implemented Partitioning, Bucketing in Hive.
  • Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
  • Worked with multiple Input Formats such as Text File, Key Value, and Sequence File Input Format.
  • Experienced in running Hadoop Streaming jobs to process terabytes of Json format data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Created HBase tables to store various data formats of incoming data from different portfolios.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Developed the verification and control process for daily load.
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
  • Worked collaboratively with different teams to smoothly slide the project to production.

Environment: HDFS, Pig, Hive, Sqoop, Shell Scripting, HBase, Zoo Keeper, MySQL.

Software Developer

Confidential

Responsibilities:

  • Performed analysis for the client requirements based on the developed detailed design documents.
  • Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Models using Microsoft Visio.
  • Developed STRUTS forms and actions for validation of user request data and application functionality.
  • Developed a web service using SOAP, WSDL, XML and SOAP - UI.
  • Developed JSP's with STRUTS custom tags and implemented JavaScript validation of data.
  • Involved in developing business tier using stateless session bean.
  • Used JavaScript for the web page validation and Struts Valuator for server side validation
  • Designing the database and coding of SQL, PL/SQL, Triggers and Views using IBMDB2.
  • Design patterns of Delegates, Data Transfer Objects and Data Access Objects.
  • Developed Message Driven Beans for asynchronous processing of alerts.
  • Used Clear case for source code control and JUnit for unit testing.
  • The networks are simulated in real-time using an ns3 network simulator modified for multithreading across multiple cores, which is implemented on generic Linux machine.
  • Involved in peer code reviews and performed integration testing of the modules.

Environment: STRUTS, JSP's with STRUTS, JDBC, Struts Valuator, SQL, PL/SQL, IBMDB2, JUNIT, Java / J2ee, JSP, servlets, EJB 2.0, SQL Server, Oracle 9i, Jboss & Web Logic Server 6, JavaScript.

TECHNICAL SKILLS

Programming Languages: C, C++, Java (core), J2EE, UNIX Shell Scripting, Python

Web Languages: HTML, JAVA SCRIPT, CSS

Hadoop Ecosystem: MapReduce, HBASE, HIVE, PIG, SQOOP, Zookeeper, OOZIE, Flume, HUE, Kafka, AWS EMR, SPARK, SPARK-SQL

Database Languages: MySQL, NOSQL

Database: Oracle, SQL

Virtualization & Cloud Tools: Amazon AWS, VMware, Virtualbox

Visualization tools: Power Bi, Tableau

Web/Application: Servers Apache Tomcat

Version Control Tools: GIT and SVN

Operating Systems: Windows, Linux (Ubuntu, Red Hat, Cent OS)

IDE Platforms: Eclipse, Net Beans, Visual Studio

Methodologies: Agile, SDLC

We'd love your feedback!