Hadoop Developer Resume San Francisco, CA - Hire IT People

PROFESSIONAL SUMMARY:

Overall 8+ years of professional IT experience in Software Development. This also includes 4 years of experience in Ingestion , Storage , Querying , Processing and Analysis of Big Data using Hadoop technologies and solutions.
Excellent understanding/knowledge of Hadoop architecture and various components of Hadoop ecosystem such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce &YARN.
Hands on experience in using Hadoop ecosystem components like Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Flume, Zookeeper, Hue, Kafka, Storm & Impala.
Experience with Agile Methodology.
Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's and Datasets .
Developed producers for Kafka which compress, and bind many small files into a larger Avro and Sequence files before writing to HDFS to make best use of a Hadoop block size.
Experience in analyzing data using Hive QL, Pig Latin, HBase and custom Map Reduce programs in Java.
Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
Expertise in job workflow scheduling and monitoring tools like Oozie.
Developed simple to complex Map / Reduce jobs using Hive and Pig to handle files in multiple formats like JSON , Text , XML , Sequence File etc.
Worked extensively on creating combiners , Partitioning , Distributed cache to improve the performance of Map Reduce jobs.
Experience in working with different data sources like Flat files , XML files , log files and Database.
Very Good understanding and Working Knowledge of Object Oriented Programming ( OOPS ) .
Expertise in application development using Scala , RDBMS , and UNIX shell scripting.
Experience developing Scala applications for loading/streaming data into NoSQL databases (HBASE) and into HDFS.
Worked on ingesting log data into Hadoop using Flume.
Experience in managing and reviewing Hadoop log files.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management System and vice-versa.
Using Apache Flume, collected and stored streaming data(log data) in HDFS.
Experience in optimizing the queries by creating various clustered, non-clustered indexes and indexed views using and data modeling concepts.
Experience with scripting languages (Scala,Pig,Python and Shell) to manipulate data.
Worked with relational database systems (RDBMS) such as My SQL, and No SQL database systems like HBase and had basic knowledge on MongoDB and Cassandra.
Hands on experience in identifying and resolving performance Bottlenecks in various levels like sources, Mappings and Sessions.
Highly Motivated, Adaptive and Quick learner.
Ability to adapt to evolving Technology, Strong Sense of Responsibility and Accomplishment.

TECHNICAL SKILLS:

Hadoop, HDFS, Yarn, Map Reduce, Spark, Hive, Pig, Sqoop, Flume, Kafka, Storm, Oozie, Zookeeper, Impala, Hue.
HBase, Cassandra, MongoDB
Cloudera Manager, Horton Works.
Java, Scala.
Oracle 8i, 9i, 10g, 11g, MS Sql Server.
TCP/IP, DNS, NIS, NIS+, NFS, AutoFS.
Centos, Ubuntu, Linux, Windows.

EXPERIENCE:

Confidential, San Francisco, CA

Hadoop Developer

Responsibilities:

Involved in file movements between HDFS and AWSS3 and extensively worked with S3 bucket in AWS.
Developing use cases for processing real time streaming data using tools like Spark Streaming.
Handled large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark , Effective & efficient Joins, Transformations.
Imported required tables from Rdbms to HDFS using Sqoop and used Spark and Kafka to get real time streaming of data into HBase .
Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled Json Data.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
Responsible for batch processing of data sources using Apache Spark.
Developed predictive analytic using Apache Spark Scala APIs.
Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
Developed Kafka producer and consumers, Hbase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
Worked on a product team using Agile Scrum methodology to Design, Develop, Deploy and support solutions that leverage the Client big data platform.
Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
Design and code from specifications, Analyzes, Evaluates, Tests, Debugs, Documents, and Implements Complex Software Apps.
Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of Joins, Group and Aggregation and how does it translate to Map Reduce jobs
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Implemented Cloudera Manager on existing cluster.
Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x
Responsible for troubleshooting debugging and fixing the wrong data or data missing problem for Oracle Database (Mysql).

Environment: HDFS, MapReduce, JavaAPI, JSP, JavaBean, Pig,Azure,Jenkins, Hive, Sqoop, Flume, Oozie, HBase, Kafka,Impala, Spark Streaming, Storm, Yarn, Eclipse, Unix Shell Scripting, Cloudera.

Confidential, Dearborn, MI

Hadoop Developer

Responsibilities:

Data Ingestion implemented using Sqoop, Spark, loading data from various Rdbms.
Responsible for design development of Spark Sql Scripts based on Functional Specifications.
Data cleansing, transformations tasks are handled using Spark using Scala and Hive.
Involved in converting Hive queries into Spark Data Frames and Datasets using Scala.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL and Pair RDD's.
Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
Responsible in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & Efficient Joins, Transformations and other during Ingestion process itself.
Data Consolidation was implemented using Spark, Hive to generate data in the required formats by applying various ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to Hdfs.
ETL development to normalize this data and publish it in Impala.
Responsible for Job management using Fair scheduler and Developed Job Processing scripts using Oozie Workflow.
Wrote a Shell Script to Convert all hive Internal tables to External tables.
Integrated Hive with Hbase.
Primarily responsible for designing, Testing, and maintaining database solution for Azure.
Responsible for Performance Tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and Memory tuning.
Importing and exporting data into Hdfs and Hive, Pig using Sqoop.
Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
Implemented the workflows using Apache Oozie framework to automate tasks.
Worked with No SQL databases like HBase. Creating HBase tables to load large sets of semi structured data coming from various sources.
Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
Responsible to manage data coming from different sources.
Responsible for Loading and Transforming of large seta of Structured, Semi Structured and Unstructured data.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Scala, Hive, HBase, Flume, Java, Impala, Pig, Spark, Oozie, Oracle, Yarn, Junit, Unix, Cloudera, Flume, Sqoop, HDFS, Java, Python.

Confidential, Saint Louis, MO

Hadoop Developer

Responsibilities:

Data Ingestion using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Apache tools like Flume and Sqoop into Hive and Nosql databases like Hbase.
Developed job flows in Oozie to automate the workflow for Pig and Hive jobs.
Designed and built the reporting application that uses the Spark SQL to fetch and generate reports on HBase table data.
Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.
Implemented helper classes that access HBase directly from Java using Java API.
Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
Responsible for converting ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
Extracted the needed data from server and into Hdfs and bulk loaded the cleaned data into HBase
Handled different time series data using HBase to store data and perform analytics based on time to improve queries retrieval time.
Participated with admin in installation and configuring Map Reduce, Hive and HDFS.
Implemented CDH3 Hadoop cluster on CentOS, assisted with performance tuning and monitoring
Used IMPALA to analyze data ingested into HBase and compute various metrics for reporting on the dashboard.
Managed and reviewed Hadoop log files.
Involved in review of functional and non-functional requirements.

Environment: Hortonworks Hadoop 2.0, EMP, Cloud Infrastructure (Amazon AWS), JAVA, Python, HBase, Hadoop Ecosystem, Linux,Scala.

Confidential, Philadelphia, PA

Hadoop Developer

Responsibilities:

Involved in designing and developing Hadoop Map Reduce jobs Using JAVA Runtime Environment for the batch processing to search and match the scores.
Involved in developing Hadoop Map Reduce jobs for merging and appending the repository data.
Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie.
Executed speedy reviews and first mover advantages by using workflows like Oozie in order to automate the data.
Loading process into the Hadoop distributed File System (HDFS) and Pig language in order to preprocess the data.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, Sqoop, Flume).
Worked on Oozie workflow engine for Job scheduling .
Importing and exporting large sets of data into HDFS and vice-versa using Sqoop.
Used Java for reading data from MySql database and transferring it to HDFS.
Transferred log files from the log generating servers into HDFS.
Read the log generated data form HDFS using advanced HiveQL(Serialization-De Serialization).
Executed the HiveQL commands on CLI (Command Line Interface) and transferred back the required output data to HDFS.
Worked on Hive partition and bucketing concepts and created hive External and Internal tables with Hive partition

Environment: Hadoop, Map Reduce, Hdfs, Hive, Sql, Pig, Zookeeper, MongoDb, Centos, Cloudera Manager, Sqoop, Oozie, Zookeeper, MySql, Hbase, Solr, Java.

Confidential, Chicago , IL

Java Developer

Responsibilities:

Developed Java Script Behavior code for user interaction.
Developed UI screens for data entry application in Java GUI.
Implemented the project according to the Software Development Life Cycle (SDLC).
Front end screens development-using JSP with tag libraries and Html pages.
Followed Coding Guidelines and update the status leads in time.
Involved in Requirements Gathering, Analysis, Design, Development, Testing and Maintenance phases of Application.
Used core java concepts like Collections, Generics, Exception handling, IO, Concurrency to develop business logic.
JSON is used for serializing and de serializing data that is sent to or receives from JSP pages.
Closely working with QA, Business and Architect to solve various Defects in quick and fast to meet the deadlines.
Ensure all open issues/and or risks are Documented prior to moving to next Testing stage
Involved in writing the Integrations tests and Testing the workflow of the service.
Involved in writing the Junit Test Cases and testing the functionality. And also involved in smoke testing & integrating testing.
Created Style Sheets (CSS) to control the look and feel of entire site.
Developed client side screen using Html.
Used Eclipse as IDE.
Written multiple Map Reduce programs in Java for Data Analysis .
Involved in submitting and tracking Map Reduce jobs using Job Tracker .
Used Html and Css, as view components in MVC.
Verify all Entry/ Exit criteria are completed with appropriate sign off.
The work consisted mainly of Parsing data from the source databases into the warehouse.

Environment: Core Java, JavaScript, Java, Gui, Html, Css, Junit, Eclipse, Uml, Json, Xml, Web Services, Wsdl, Unix, Mvc,Jsp.

Confidential, Ventura, CA

Java/J2EE Developer

Responsibilities:

Automated code deployment to production environment by creating tasks using ANT deployment tool.
Involved in system design, enterprise application development using object-oriented analysis in Java/JEE6.
Developed stored procedures, views and triggers using Oracle PL/SQL.
Involved in the analysis & design of the application using UML with Rational Rose.
Developed a web based application using java, JSP, Servlets, HTML with SDLC (Software Development Life Cycle) model.
Used JSP and HTML for creating UI. Used javaScript for client side validation.
Implemented SQL queries to retrieve and insert data from/into the database using Oracle 10g.
Implemented complex back-end component to get the count in no time against large size MySQL database using java multi-threading.
Used Hibernate as ORM to map java classes to data base tables.
Created named queries, HQL queries, typed queries and Query results with in Hibernate
Developed XML, XSD and parsers SAX and DOM and implemented System Oriented Architecture Methodology.
Used Ant tool to build the code & deployed the application on IBM Web sphere application server.
Involved in Code Review and in Unit testing using JUnit and Integration testing of the application.
Used Confidential as version control system, to keep track of all the work and changes to allow several Developers to collaborate. Deployed the web application on Apache Tomcat application server.

Environment: Java, JSP, XML, SQL, Hibernate, HQL queries, XML, XSD, parsers SAX, DOM Webservices, JSON, JUnit, Centos 6, Open LDAP, JetSpeed, jQuery.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

San Francisco, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship