Hadoop Developer/ Administrator Resume

SUMMARY:

Confidential ars of experience in Information Technology which includes in Big data and HADOOP Ecosystem. In - depth knowledge and hands-on experience in dealing with Apache Hadoop components like HDFS, MapReduce, HiveQL, HBase, Pig, Hive, Sqoop, Oozie, Cassandra, Flume, and Spark.
Hands-on Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology.
Hands on experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries, using Oracle, DB2 and MySQL.
Extensively used Kafka to load the log data from multiple sources directly into HDFS. Knowledge on RabbitMQ. Loaded streaming log data from various webservers into HDFS using Flume
Expertise in database design, creation and management of schemas, writing Stored Procedures, Functions, DDL, DML, SQL queries & Modeling.
Skilled in leadership, self-motivated and ability to work in a team effectively. Possess excellent communication and analytical skills along with a can-do attitude.
Good in Hive and Impala queries to load and processing data in Hadoop File system (HFS).
Good understanding of NoSQL Data bases and hands on work experience in writing applications on NoSQL databases like Cassandra and MongoDB.
Hands on Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
Strong knowledge in MongoDB concepts includes CRUD operations and aggregation framework and in document Schema design.
Experience in maintenance/bug-fixing of web based applications in various platforms.
Experience in managing life cycle of MongoDB including sizing, automation, monitoring and tuning.
Experience in storing, processing unstructured data using NoSQL databases like HBase.
Good in developing web-services using REST, HBase Native API Client to query data from HBase.
Experience in working with MapReduce programs using Hadoop for working on Big Data.
Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
Experience on ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoo p using shell script, SQOOP, package and MYSQL
Experienced in job workflow scheduling, monitoring tools like Oozie and Zookeeper.
Experience with Oozie Workflow Engine in running workflow jobs with actions that run Java MapReduce and Pig jobs.
Good experience with both Job Tracker (Map reduce 1) and YARN (Map reduce 2).
Experience in managing and reviewing Hadoop Log files generated through YARN.
Experience in using Apache Solr for search applications.
Knowledge of Business Intelligence and Reporting.
Experienced in Java, Spring Boot, Apache Tomcat, Maven, Gradle, Hibernate and open source frameworks/ software's.
Passionate towards working in Hadoop and Big Data Technologies, data science, machine learning in Spark, Big Data Processing, Analytics and Visualization.
Versatile experience in utilizing Java tools in business, web and client server environments including Java platform, JSP, Servlets, Java beans and JDBC.
Expertise in developing the presentation layer components like HTML, CSS, JavaScript, JQuery, XML, JSON, AJAX and D3.

TECHNICAL SKILLS:

Hadoop Technologies/Tools: HDFS, YARN, Cloudera, HBase, Hive, Pig, Oozie, Sqoop, Flume, Storm, Zoo Keeper, AWS, RackSpace, HortonWorks, CDH 4, CDH 5, Shell Scripting, Map HBASE, and Kafka. Impala, Storm, Mongo DB, Hadoop ECO Systems, Map Reduce,. Cassandra hadoop applications management Administration, Monitoring, Debugging, Performance tuning

Big Data Platforms: Hortonworks, Cloudera, Amazon

Programming Languages: C, C++, JSE, XML, JSP/Servlets, HTML, JavaScript, jQuery, Web services, Python, Scala, PL/SQL & Shell Scripting SQL, PL/SQL, Confidential -SQL, Databases, C#, CSS, HTMl, Java JDBC, jQuery, Spring, Scala, COBOL, JCL, Java, JavaScript, XML, HTML, and CSS

Operating Systems: Linux, UNIX, RedHat, CentOs, Ubuntu, Mac, Windows 7,8,10

Web Technlogies: MQseries, struts, Junit, ODBC, JDBC, HTML, XML, XSL, XSD, CSS, JavaScript, Hibernate, Spring

IDE's & Utilities: Eclipse, JCreator, NetBeans, GitHub, Jenkins, Maven, IntelliJ, Ambari.

Databases: SQL Server, Oracle 10g, SQL Profiler, Oracle, MYSQL, Microsoft SQL Server, HBase and Cassandra, . HiveQL

Protocols: TCP/IP, HTTP HTTPS. FTP, SMTP and DNS

EMPLOYMENT HISTORY:

Confidential, NYC

Hadoop Developer/ Administrator

Responsibilities:

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop
Created Hive schemas using performance techniques like partitioning and bucketing Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Extensively worked on different file formats like PARQUET, AVRO & ORC
Worked on data processing using SPARK and SCALA and Written a code using SCALA for different transformations in SPARK. And creating RDD's over that.
Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, JSON, CSV formats
Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
Good knowledge in other SQL and NoSQL Databases like MySQL, MS SQL, MongoDB, HBase and Cassandra
Developed Pyspark code to read data from Hive, group the fields and generate XML files.Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
Involved in creating POCs to ingest and process streaming data using Spark streaming and Kafka Worked in aggressive AGILE environment and participated in daily Stand-ups/Scrum Meetings
Worked on NoSQL database including MongoDB, Cassandra and HBase .
Hands on Experience through hackathons in Scala, Clojure, Python, Perl, R, Ruby, Groovy & Grails.
Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
Wrote multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and another compressed file format
Working knowledge in modifying and executing the UNIX shell scripts. Involved in web testing using soap UI for different member and provider portals
Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availabili
Evaluated performance of Spark application by testing on cluster deployment mode vs local mode
Evaluated performance of Spark application by testing on cluster deployment mode vs local mode
Experimented submissions with Test OIDs to the vendor website.
Migrated HiveQL to SparkSQL to validate Spark's performance with Hive's. Implemented Proof of concept for Dynamo DB, Redshift and EMR
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
Hands on experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries, using Oracle, DB2 and MySQL.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.
Good working experience on different OS like UNIX/Linux, Apple Mac OS-X Windows.
Implemented REST call to submit the generated CDAs to vendor website Implemented Impyla to support JDBC/ODBC connections for Hiveserver2

Environment: Sqoop, StreamSets, Impyla, Pyspark, Solr, Oozie, Hive, Impala

Confidential, Dallas, Texas

Hadoop Developer

Responsibilities:

Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON
Supported Map Reduce Programs that are running on the cluster.
Worked on NoSQL database including MongoDB, Cassandra and HBase
Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB. Sorted the data by using indexing
Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop
Working knowledge in modifying and executing the UNIX shell scripts. Involved in web testing using soap UI for different member and provider portals.
Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS/Cassandra cluster.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, SQOOP, package and MYSQL.
Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MRv1 and MRv2 (YARN)
Developed simple to complex MapReduce streaming jobs using Java language for processing and validating the data.
Developed data pipeline using Flume, Sqoop, Pig and Java map reduce and Spark to ingest customer behavioral data and purchase histories into HDFS for analysis.
Wrote extensive Hive queries and fine-tuned them for performance as part of the multiple step process to get the required results for Tableau to generate reports.
Real time streaming the data using Spark with Kafka.
Worked on persist and cache to store the require RDDs on the memory and to use when it is necessary on the other transformations
Experience in Data Serialization formats for converting Complex objects into sequence bits by using AVRO, JSON, CSV formats
Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows. Involved in Unit testing and delivered Unit test plans and results documents.
Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala, Python
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Hands on experience in performing real time analytics on big data using HBase and Cassandra in Kubernetes & Hadoop clusters.
Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
Maintain Hadoop, Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring
Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume

Environment: Hadoop, HDFS, Hive, Impala, Pig, Sqoop, HBase, Shell Scripting, Scala, Python

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
Experienced in running Hadoop streaming jobs to process terabytes data.
Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, SQOOP, package and MYSQL.
Created UDF's to store specialized data structures in HBase and Cassandra.
Extensive experience with SQL, PL/SQL and database concepts, Developed stored procedures and queries using PL/SQL.
Designing detailed technical components for complex applications utilizing high-level architecture, design patterns and reusable code.
Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB
Experience in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the requirement and performed reads and writes using datastax connectivity.
Worked on analyzing the Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Developed Hive queries to process the data and generate the data cubes for visualizing, also Involved in creating Hive tables, and loading and analysing data using hive queries
Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS
Developed a data pipeline using Kafka and Storm to store data into HDFS.
Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
Involved in loading data from UNIX file system to HDFS
Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
Hands on experience on data storage and retrieval techniques, ETL(migration), and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases.
Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle
Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
Experienced in relational databases like MySQL, Oracle and NoSQL databases like HBase and Cassandra.
Using HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
Extracted the data from various sources into HDFS using Sqoop and ran Pig scripts on the huge chunks of data.
Involved in creating Hive Tables, loading data and writing Hive queries in Apache Hadoop environment by Cloudera.

Environment: Hadoop, HDFS, Hive, Scala, Spark, SQL, Teradata, UNIX Shell Scripting

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship