Hadoop Developer Resume

PROFESSIONAL SUMMARY:

Excellent experience in developing applications that perform large scale Distributed Data Processing using Big Data ecosystem tools Hadoop, Hive, Pig, Sqoop, Hbase, Spark, Spark Streaming, Spark SQL, Oozie, ZooKeeper, Flume, Kafka and Yarn.
Hands on experience in using various Hadoop distributions (Cloudera, Hortonworks, MapR).
Deep knowledge on spark architecture and how RDD's work internally. Have exposure to Spark Streaming, Spark SQL, No SQL databases like Cassandra and Hbase.
Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Experience in implementing Real - Time event processing and analytics using messaging systems like Spark Streaming.
Worked on Spark Scripts to find the most trending products (day-wise and week-wise) using Scala.
Exposure in analyzing data using HiveQL, HBase and custom Map Reduce programs in Java
Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
Experience in HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in Hive.
Well-versed with the web technologies such as HTML, CSS and JavaScript.
Written MapReduce programs in Java for data extraction, transformation and aggregation from various file formats which includes XML, JSON, CSV, Avro, Parquet, ORC, Sequence, Texts and other formats.
Experienced with the Apache Spark improving the performance and optimization of the existing algorithms in Hadoop using Apache Spark Context, Apache Spark-SQL, Data Frame, Pair RDD's, Apache Spark YARN.
Good knowledge of Oozie concepts like design, development, and execution of workflows in Oozie.
Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile, Scrum methodologies.
Experienced with performing real time analytics on NoSQL data bases like HBase and Cassandra.
Highly adept at promptly and thoroughly mastering new technologies with a keen awareness of new industry developments and the evolution of next generation programming solutions.

TECHNICAL SKILLS:

Bigdata Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Storm, Scala, Spark, Apache Kafka, Flume, Solr, Ambari

Database: Oracle 10g/11g, PL/SQL, MySQL, MS SQL Server 2012

Languages: Java, Scala

Development Methodologies: Agile, Waterfall

Testing: Junit, Selenium Web Driver

NOSQL Databases: HBase

ETL Tools: Tableau

IDE Tools: Eclipse, NetBeans, Intellij

Modelling Tools: Rational Rose, StarUML, Visual paradigm for UML

Relational DBMS, Client: Server Architecture

Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop Developer

Responsibilities:

Implemented Spark using Scala and Spark SQL for faster testing and processing of data and Used Spark transformations for Data Wrangling and ingesting the real-time data of various file formats
Worked on loading CSV/TXT/AVRO/PARQUET files using Scala/Java language in Spark Framework and process the data by creating Spark Data frame and RDD and save the file in parquet format in HDFS to load into fact table using ORC Reader
Experienced with Spark Context, Spark-SQL, Data Frame, Datasets, Spark YARN.
Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations.
Involved in loading data from UNIX/LINUX file system to HDFS.
Analyzed the data by performing Hive queries.
Developed Simple to complex Map/reduce Jobs using Hive, Pig and Python.
Extending Hive functionality by writing custom UDFs.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
Developed data pipeline using Hive from Teradata, DB2 data sources. These pipelines had customized UDF'S to extend the ETL functionality.
Developed hive queries and UDFS to analyze/transform the data in HDFS. Developed hive scripts for implementing control tables logic in HDFS.
Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
Automated the code deployment process.
Implemented functionalities by performing Sentiment Analysis of the products and performed Trend Analysis of the products and display it to the user.
Used Flume to stream through the log data from various sources.
Configured Flume to extract the data from the web server output files to load into HDFS.
Install and maintain Hadoop and NoSql applications.
Worked on No-SQL databases like Hbase, MongoDB for POC purpose in storing images and URIs.
Managed and reviewed Hadoop and HBase log files. Worked on HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
Performed data analysis with HBase using Hive External tables. Exported the analyzed data to HBase using Sqoop and to generate reports for the BI team.
Import the data from relational database to Hadoop cluster by using Sqoop.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Monitored Hadoop cluster job performance and capacity planning. Providng the architectural design to Business users.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Installed Oozie workflow engine to automate Map/Reduce jobs.
Building the Hadoop cluster and sizing the cluster based on the data which extracted from all the sources.

Confidential

Jr. Hadoop Developer

Responsibilities:

Created Hive external tables and managed tables, designed data models in hive
Configured Hive Meta store with MySQL, which stores the metadata of Hive tables
Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from various sources using Flume and managing
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability
Successfully migrated Legacy application to Big Data application using Hive/HBase in Production level
Load and transform large sets of structured, semi structured and unstructured data that includes Avro, sequence files and xml files
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
Utilized Apache Hadoop environment by Cloudera
Worked as POC for initial MongoDB clusters, Hadoop clusters and various Teradata servers and successfully tested, on boarded and performed basic admin and dba tasks
Querying the MongoDB database using JSON
Supported code/design analysis, strategy development and project planning
Developed Simple to complex Map Reduce Jobs using Hive and Hbase.
Created concurrent access for hive tables with shared/exclusive locks enabled by implementing Zookeeper in cluster
Transferred data from various OLTP data sources, such as Oracle, MS Access, MS Excel, Flat files, CSV files into SQL Server
Helped the Business intelligence team in designing dashboards and workbooks
Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.

Confidential

Data Engineer

Responsibilities:

Collaborated on insights with other Data Scientists, Business Analysts, and partners.
Uploaded data to Hadoop hive and combined new tables with existing databases
Responsible for building scalable distributed data solutions using Hadoop.
Implemented POC for using apache impala for data processing on top of hive
Developed Scala scripts, UDFs using both Data frames/SQL/Data sets in Spark 2.1 for Data Aggregation, queries and writing data back into OLTP system through Sqoop
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive, Designing, creating and maintaining GIT repositories according to the client specifications.
Data pipeline consists Spark, Hive and Sqoop and custom-built Input Adapters to ingest, transform and analyze operational data.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Monitored workload, job performance and capacity planning using Cloudera Manager.
Used Spark to perform analytics on data in hive.
Automating the jobs using Oozie.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Extracted data from oracle SQL server and MYSQL databases to hdfs using sqoop

Confidential

IT Helpdesk Technician

Responsibilities:

Provide hardware and software support, including the installation of new software and updates when required, across all supported sites.
Served in computer maintenance, performed all types of hardware, software maintenance and engineering in addition to systems selection, backup and technical support.
Implemented and Maintained Routing Protocols EIGRP and OSPF in the Network.
Configured and demonstrated switching concepts such as trunking, ether channels, inter VLAN
Trouble shooting network issues & provided incident reviews
Created users accounts, provided authorization.
Configure VLAN for different department and maintain the network.
Spearheaded meetings & discussions with team members regarding network optimization and regarding BGP issues.
Handled switching related tasks included implementing VLANS, VTP and configuring Fast-Ethernet channel between switches.
Configuring of IP Allocation and sub-netting for all applications and servers and other needs throughout company using FLSM, VLSM addressing.
Troubleshoot the issues related to RIP, OSPF, and EIGRP routing protocols.
Perform routine network maintenance checks as well as configure and manage printers, copiers, and other miscellaneous network equipment.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship