Hadoop Developer Resume Jessup, PA - Hire IT People

SUMMARY:

Over 7+ years of extensive hands on experience with Hadoop Ecosystem stack including HDFS,
MapReduce, Sqoop, Hive, Pig, HBase, Oozie, Flume, Kafka, Zookeeper, and Spark.
Experience in different Hadoop distributions like Cloudera and Horton Works Distributions (HDP).
Comfortable working with various facets of the Hadoop ecosystem, real - time or batch, Structured or Unstructured data processing.
Experience with NoSQL databases like HBase as well as other ecosystems like Zookeeper, Oozie, Impala, Storm, Spark- Streaming/SQL, Kafka, Flume.
Expertise skills in handling analytics projects using Big Data technologies.
Hands on experience in ingesting data from external servers to Hadoop.
Experience in moving large amounts of log, streaming event data and Transactional data using Flume.
Hands on experience developing workflows that execute Sqoop, Pig, Hive and Shell scripts using Oozie.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Good experience with Hive Data Warehousing concepts like Static/Dynamic Partitioning, Bucketing, Managed, and External Tables, join operations on tables.
Proficient in building user defined functions (UDF) in Hive and Pig, to analyze data and extended HiveQL and Pig Latin Functionality.
Experience in working with Spark transformations and actions on RDDs and Spark-SQL, Data Frames in Python.
Experience in implementing unified data ingestion platform using Kafka producers and consumers.
Experience in implementing near real-time event processing and analytics using Spark Streaming
Proficient with Flume topologies for data ingestion from streaming sources into Hadoop.
Well versed with major Hadoop distributions: Cloudera and HortonWorks
Having experience on Eclipse, NetBeans IDEs.
Ability to adapt to evolving Technology, Strong sense of responsibility and Accomplishment.
Has very good development experience with Agile Methodology.
Strong experience in distinct phases of Software Development Life cycle (SDLC) including Planning, Design, Development and Testing during the development of software applications.
Excellent leadership, interpersonal, problem solving and time management skills.
Excellent communication skills both written (documentation) and verbal (presentation).
Very responsible and good team player. Can work independently with minimal supervision

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential - Jessup, PA

Responsibilities:

Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters with agile methodology.
Monitored multiple Hadoop clusters environments using Ganglia, monitored workload, job performance and capacity planning using Cloudera Manager.
Experienced with through hands - on experience in all Hadoop, Java, SQL and Python.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Participated in functional reviews, test specifications and documentation review
Performed MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by Business Intelligence tools.
Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
Documented the systems processes and procedures for future references, responsible to manage data coming from different sources.

Environment: Hadoop, HDFS, Map Reduce, Flume, Pig, Sqoop, Hive, Pig, Sqoop, Oozie, Ganglia, HBase, Shell Scripting.

Hadoop Developer

Confidential - Pittsburgh, PA

Responsibilities:

Worked on Spark SQL to handle structured data in Hive.
Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
Worked on complex MapReduce program to analyses data that exists on the cluster.
Analysed substantial data sets by running Hive queries and Pig scripts.
Written Hive UDFs to sort Structure fields and return complex data type.
Worked in AWS environment for development and deployment of custom Hadoop applications.
Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
Creating files and tuned the SQL queries in Hive utilizing HUE.
Involved in collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
Created the Hive external tables using Accumulo connector.
Managed real time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
Created custom SOLR Query segments to optimize ideal search matching.
Developed Spark scripts by using Python shell commands.
Stored the processed results In Data Warehouse, and maintaining data using Hive.
Worked with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
Worked with NoSQL databases like MongoDB in making MongoDB tables to load expansive arrangements of semi structured data.
Developed Spark scripts by using Python shell commands as per the requirement.
Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, and EMR.

Environment: HDFS, MapReduce, Storm, Hive, Pig, Sqoop, MongoDB, Apache Spark, Python, Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, SOLR, GIT, Maven.

Hadoop Consultant

Confidential - Phoenix, AZ

Responsibilities:

Automation of data pulls into HDFS from MySQL server and Oracle DB using Sqoop.
Analyzing source data tables for best possible loading strategies.
Involved in various stages of this project like planning, estimation the hardware and software, installing (SDLC).
Develop Shell scripts to perform various ETL jobs like creating staging and final tables.
Implemented 2 level staging process for Data Validation.
Extracted data from staging tables and analyzed data using Impala.
Implement ad - hoc queries using Impala, create tables with partitioning and bucketing to load data.
Created a Spark application to process and stream data from Kafka to MySQL.
Implement Hive Incremental updates using four-step strategy to load incremental data from RDBM systems.
Implement, configure optimization techniques like Bucketing, Partitioning and File Formats.
Used Spark to analyze data in HIVE, HBase and HDFS.
Involved in Hadoop Cluster Administration that includes adding and removing Cluster Nodes, Cluster Capacity Planning, and Performance Tuning.
Worked on Hadoop clusters capacity Planning and Management.
Monitoring and Debugging Hadoop jobs Applications running in production.
Written a PIG Scripts to read data from HDFS and write into Hive Table.
Experience of performance tuning Hive ETL Scripts, Pig Scripts, MR Jobs in production environment by altering job parameters.
Providing various hourly/weekly/monthly aggregation reports required by clients through Spark.
Worked on data processing part mainly to make the Unstructured Data to Semi-Structured Data and loaded into Hive tables, HBase tables and integration.
Load log data into HDFS using Flume.
Written the Apache PIG scripts to process the HDFS Data.
Developed Spark SQL scripts with Python for analysis and Demo purposes.

Environment: MapReduce, Spark, HDFS, Pig, HBase, Oozie, Zookeeper, Sqoop, Linux, Kafka, Hadoop, Maven, NoSQL, MySQL, Hive, Java, Eclipse, Python.

Hadoop Developer

Confidential

Responsibilities:•

Worked on Hadoop Ecosystem using different big data analytic tools including Hive, Pig.
Involved in loading data from LINUX file system to HDFS.
Importing and exporting data into HDFS and Hive using Sqoop.
Implemented Partitioning, Bucketing in Hive.
Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
Worked with multiple Input Formats such as Text File, Key Value, and Sequence File Input Format.
Experienced in running Hadoop Streaming jobs to process terabytes of Json format data.
Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs
Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Created HBase tables to store various data formats of incoming data from different portfolios.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Developed the verification and control process for daily load.
Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
Worked collaboratively with different teams to smoothly slide the project to production.

Environment: HDFS, Pig, Hive, Sqoop, Shell Scripting, HBase, Zoo Keeper, MySQL.

Software Developer

Confidential

Responsibilities:

Performed analysis for the client requirements based on the developed detailed design documents.
Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Models using Microsoft Visio.
Developed STRUTS forms and actions for validation of user request data and application functionality.
Developed a web service using SOAP, WSDL, XML and SOAP - UI.
Developed JSP's with STRUTS custom tags and implemented JavaScript validation of data.
Involved in developing business tier using stateless session bean.
Used JavaScript for the web page validation and Struts Valuator for server side validation
Designing the database and coding of SQL, PL/SQL, Triggers and Views using IBMDB2.
Design patterns of Delegates, Data Transfer Objects and Data Access Objects.
Developed Message Driven Beans for asynchronous processing of alerts.
Used Clear case for source code control and JUnit for unit testing.
The networks are simulated in real-time using an ns3 network simulator modified for multithreading across multiple cores, which is implemented on generic Linux machine.
Involved in peer code reviews and performed integration testing of the modules.

Environment: STRUTS, JSP's with STRUTS, JDBC, Struts Valuator, SQL, PL/SQL, IBMDB2, JUNIT, Java / J2ee, JSP, servlets, EJB 2.0, SQL Server, Oracle 9i, Jboss & Web Logic Server 6, JavaScript.

TECHNICAL SKILLS

Programming Languages: C, C++, Java (core), J2EE, UNIX Shell Scripting, Python

Web Languages: HTML, JAVA SCRIPT, CSS

Hadoop Ecosystem: MapReduce, HBASE, HIVE, PIG, SQOOP, Zookeeper, OOZIE, Flume, HUE, Kafka, AWS EMR, SPARK, SPARK-SQL

Database Languages: MySQL, NOSQL

Database: Oracle, SQL

Virtualization & Cloud Tools: Amazon AWS, VMware, Virtualbox

Visualization tools: Power Bi, Tableau

Web/Application: Servers Apache Tomcat

Version Control Tools: GIT and SVN

Operating Systems: Windows, Linux (Ubuntu, Red Hat, Cent OS)

IDE Platforms: Eclipse, Net Beans, Visual Studio

Methodologies: Agile, SDLC

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Jessup, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship