Hadoop Developer Resume
Chantilly, VA
PROFESSIONAL SUMMARY:
- Over all 7 years of experience in Data analysis , Designing , Developing and implementation of enterprise applications, Data Integration , Object Oriented programming and source control management
- Excellent understanding and hands on experience in installing, configuring and using Hadoop ecosystem components like HDFS, Map Reduce, YARN , and tools including Pig and Hive for data analysis , HBase, Impala, Sqoop for data migration, Flume for data ingestion , Elastic Search , Oozie for scheduling and Zookeeper for coordinating cluster resource , Kafka, Spark, Cassandra with Cloudera and Hortonworks distribution .
- In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
- Developed Collections in Mongo DB and performed aggregations on the collections.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Involved in creating Hive Tables , loading with data and writing Hive queries , which will invoke and run MapReduce jobs in the backend.
- Experience in writing HiveQL queries to store processed data into Hive tables for analysis.
- Experience in building Pig scripts to extract , transform and load data onto HDFS for processing
- Knowledge and understanding on industry latest Hadoop ecosystems like Apache Spark integration with Hadoop
- Loaded streaming log data from various webservers into HDFS using Flume .
- Experience in Data migration from RDBMS to Cassandra .
- Used Kibana for visualizing and to generate reports
- Expertise in writing Spark RDD transformations , actions, Data Frames , case classes for the required input data and performed the data transformations using Spark-Core .
- Expertise in developing Real-Time Streaming Solutions using Spark Streaming
- Good experience in client web technologies like HTML, CSS, JavaScript and AJAX, Servlets, JSON, XML, AWS.
- Work closely with various levels of individuals to coordinate and prioritize multiple projects. Estimate scope, schedule and track projects throughout SDLC
- Hands-on experience in Sqoop which is used in importing and exporting of the data from HDFS to Relational Databases and vice-versa.
- Experience in importing real time data such as log data, social networking data (twitter) into HDFS using Flume .
- Developed custom UDF's for Pig and Hive using java to process and analyze the data.
- Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Worked on various databases such as MySQL, Oracle, and MS-SQL SERVER .
- Worked on NoSQL databases including HBase, MongoDB .
- Experience on UNIX commands and Shell Scripting. Strong understanding in Agile methodologies
- Good knowledge in analyzing Software Requirement Specification (SRS) document. Strong knowledge in Software Development Life Cycle (SDLC). Good knowledge on writing and understanding use cases.
- Good knowledge on Data Warehousing, ETL development, Distributed Computing, and large-scale data processing
- Strong knowledge on creating and monitoring Hadoop cluster on VM, Hortonworks Data Platform 2.1 7 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS . Good knowledge on JDBC/ODBC
- Experience in creating databases , users, tables, triggers, macros, views, stored procedures, functions, Packages, joins and hash indexes in database.
TECHNICAL SKILLS:
Hadoop Ecosystem: Hadoop, Map Reduce, YARN, Pig, Hive, Hue, Ambari, HBase, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Storm, Apache NiFi
Big Data Distributions: Horton Work, Cloudera
No SQL Databases: HBase, Cassandra, Mango DB
Databases: Microsoft SQL Server, MySQL, Oracle, DB2, Teradata
Cloud Computing Tools: Amazon AWS
Web/Application Servers: Tomcat, LDAP
Languages: C, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix, HTML, Java Script, AJAX, XML and Shell Scripting
Operating Systems: VMware, Linux, Unix, OS/390, Hortonworks, Cloudera, Windows98/XP/Vista/7
PROFESSIONAL EXPERIENCE:
Confidential, Chantilly, VA
Hadoop Developer
Responsibilities
- Importing and exporting data into HDFS and Hive using Sqoop .
- Responsible for data ingestions ( ETL ).
- Importing metadata from Oracle database using sqoop.
- Automate processes in Cloudera environment and building Oozie workflows .
- Created Map Reduce jobs for data transformations and data parsing.
- Created Hive scripts for extracting the summarized information from hive tables.
- Written Hive UDFS to extract data from staging tables.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Creating Hive tables and working on them using Hive QL
- Developed Batch Job and Scripts to schedule various Hadoop Program.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Used Zookeeper for providing coordinating services to the cluster.
- Worked on loading files from Mainframes t o Data Lake using Impala SQL, and Python Scripts .
- Using Spark-streaming from web server for analyzing near real time log data .
- Performing various Spark POC to utilize the in-memory capabilities .
- Also importing millions of structure data from relational database using Sqoop import to process using spark and stored the data into HDFS in CSV format .
- Involved in loading data from Linux and UNIX file system to HDFS .
- Expertise in relational databases like Oracle, My SQL .
- Setup Puppet master and use puppet to directly deploy to the servers in staging and DR environment
Environment : Hadoop, HDFS, Map Reduce, Shell Scripting, Spark, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, Linux, Cloudera Manager, Horton works.
Confidential, Mooresville, NC
Hadoop Developer
Responsibilities:
- Developed transformations and aggregated the data for large data sets using MR, Pig and Hive scripts.
- Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
- Developed the workflow jobs using Oozie services to run the MR, Pig and Hive jobs and created JIL scripts to run Oozie jobs .
- Improved performance using advanced joins in Apache Pig and Apache Hive .
- Developed Sqoop scripts to import and export data from relational sources and handled incremental and updated changes into HDFS layer
- Experience in reading and writing files into HDFS using Java file system API .
- Developed Pig and Hive UDF's based on requirements.
- Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run Map Reduce jobs automatically.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- . Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications.
- Used Zookeeper to manage coordination among the clusters.
- Experienced in analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files
- Worked with Kafka API to develop Publisher, Subscriber components
- Worked on QA support activities, test data creation and Unit testing activities. Managing and Supporting Infrastructure, used Maven for product lifecycle management.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Create queues and allocated the clusters resources to provide the priority for jobs.
- Experienced in Setting up the project and volume setups for the new projects.
- Good troubleshooting skills on Hue, which provides GUI for developers/business users for day to day activities.
- Setup flume for different sources to bring the log messages from outside to Hadoop HDFS .
- Created the PL/SQL stored procedure, function, triggers for the Oracle 11g database.
Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, Map Reduce, HDFS, LINUX, Oozie, Cassandra, Hue, HCatalog, Java, Eclipse, Linux.
Confidential
Hadoop Developer
Responsibilities:
- Extracted and updated the data into HDFS utilizing Sqoop import/export command line utility interface
- Involved in creating Hive tables , loaded and analyzed data using Hive queries.
- Created queries and tables using MySQL
- Performed partitioning and bucketing of hive tables to store data on Hadoop
- Worked on setting up the streaming process with Flume .
- Developed SQL statements to improve back-end communications.
- Developed impala scripts for end user / analyst requirements for analysis
- Crawled public posts from Facebook and tweets.
- Used a 60-node cluster with Cloudera Hadoop Distribution on Amazon EC2
- Converted output to structured data and imported with analytics team
- Migration of ETL processes from RDBMS to HIVE to test easy data manipulation
- Handling different types of joins in Hive like Map joins, bucker map joins and sorted bucket map joins.
- Handling continuous streaming data coming from different sources using Flume and set destination as HDFS .
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS
- Followed agile methodologies to finish the tasks before the deadlines
- Worked with big data teams to move ETL tasks to Hadoop .
- Exported analyzed data to downstream systems using Sqoop-RDBMS for generating end-user reports, Business Analysis reports and payment reports
- Optimized Hive QL/ pig scripts by using execution engine like Tez
Environment : Hadoop, HBase, HDFS, Map Reduce, Java, Cloudera Manager, Amazon EC2, Sqoop, Flume, Hive
Confidential
SQL Developer
Responsibilities:
- Experience in all phases of SDLC including requirement analysis, application design, and development of multitier applications, testing, implementation and maintenance.
- Expertise in performance tuning, fine tuning SQL queries for improved performance and Query Optimization .
- Working on DB2 platform. Created database tables, Indexes, triggers by writing SQL queries as per the database change request by the business.
- Created stored procedures , functions, and packages by writing SQL queries based on the client requirement as per the day to day updating of business logic.
- Worked on optimizing large complicated SQL statements to meet the client requirements.
- Created large complex SQL queries such as to pull the formulary data to generate tier level data of various kinds of formulary and benefit lists.
- Used check point, Break point and Logging in SSIS to debug and optimize package while developing them.
- Provided SQL scripts and PL/SQL stored procedures for querying the database
- Developed reports where presented in multiple formats, such as online using web portal, e-mailed, exported in Excel format to shared folders.
- Develop, tune and debug queries, reports, triggers and stored procedures in SQL and T-SQL .
- Has very good experience in complete ETL cycle including data analysis, database design, data mapping and conversion and data load.
- Excellent problem analysis and solving skills .
- Has the ability to work in a fast paced environment, Good team player and can work independently
- Excellent on Database Backup, disaster recovery, Installation, Upgrade and Migrating of databases and Instances
- Wrote the indexers that queried data from the MySQL database and indexed it into the search engine. Configured data import handlers that augmented indexing process. Wrote API to retrieve relevant results to user queries
- Used Joins, correlated and non-correlated sub-queries for complex business queries involving multiple tables & calculations from different database
Environment : Oracle 10g, 11g, PL/SQL, SQL, UNIX Linux, Windows