Sr. Hadoop Developer Resume San Francisco, CA - Hire IT People

SUMMARY

Over 8 years of professional experience in systems analysis, software development, and training.
Experience in Hadoop/Big Data Technologies, Expertise inHadoop echo systems HDFS, Map Reduce Programming, Sqoop, Pig, Hive, Oozie, Flume, impala and HBase for scalability, distributed computing and high performance computing.
Hands on experience working with Hadoop, HDFS, Map Reduce framework and Hadoop ecosystem like Hive, HBase, KAFKA,Sqoop and Oozie.
Experience in installing, configuring and administrating Hadoop cluster for distributions like Cloudera, Horton works and MapR Hadoop distributions.
Experience in Importing and Exporting the Data using SQOOP from HDFS to Relational Database systems.
Experience in NoSQL Column - Oriented Databases like HBase and its Integration with Hadoop cluster
Strong Experience in Linux administration.
Knowledge on Kafka, Storm. And hands on experience in Spark.
Integrated Splunk with Hadoop and setup jobs to export data from and to Splunk.
Spark is a data-processing tool that operates on those distributed data collections.
Hands on experience in Scala for working with Spark Core and Spark Streaming.
Good experience on scripting languages like PYTHON, SCALA.
Worked on Oozie to manage data processing jobs for Hadoop.
Hands on experience in gathering information from different nodes into Greenplum database and then Sqoop incremental load into HDFS.
Good knowledge about Map-Reduce framework which includes MR daemons, sortingand shuffle phase, task execution.
Experience in strong and analyzing data using HiveQL, Pig Latin, SparkQL and custom MapReduce programs in Java.
Experience in analyzing data using HiveQL, PIG Latin, and custom MapReduce programs in JAVA, and well versed in Core Java.
Extending Hive and Pig core functionality by writing custom UDFs.
Experienced in working with various kinds of data sources such as Teradata and Oracle and successfully loaded files to HDFS
Experience in writing and testing Map-Reduce programs to structure the data.
Experience with Oozie Workflow Engine to automate and parallelize Hadoop MapReduce and Spark jobs.
Well versed in scheduling Oozie jobs both sequentially and parallel.
Good experience with MapReduce performance optimization techniques for effective utilization of cluster resources.
Experience working with MapRvolumes and snapshots for data redundancy.
Good level of experience in Core Java, JEE technologies as JDBC, Servlets, and JSP.
Knowledge of custom Map Reduce programs in JAVA.
Experience in creating custom Solr Query components.
Extensive experience in developing the SOA middleware based out of Fuse ESB and Mule ESB, Configured, Elastic Search logstash, kibana to monitor spring batch jobs.
Working knowledge on HTML5 and expert level proficiency in markup and scripting languages such as HTML, DHTML, XML, CSS, JavaScript, JQuery.
Expertise in using various Hadoop infrastructures such as MapReduce, Pig, Hive, HBase, Sqoop, Oozie, Flume.
Configured different topologies for Storm cluster and deployed them on regular basis.
Experienced in implementing unified data platform to get data from different data sources using Apache Kafka brokers, cluster, Java producers and Consumers.
Experienced in implementing complex algorithms on semi/unstructured data using Map reduce programs.
Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
Experienced in migrating ETL kind of operations using Pig transformations, operations and UDF's.
Experienced in migrating ETL kind of operations using Pig transformations, operations and UDF's.
Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
Specialization in Data Ingestion, Processing, Development from Various RDBMS data sources into a Hadoop Cluster using Map Reduce/Pig/Hive/Sqoop.
Excellent understanding and knowledge of NOSQL databases like HBase, Cassandra, Mongo DB, Teradata and on Data warehouse.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop 2.2, HDFS, MapReduce, Sqoop, Hive, Pig, Impala, Oozie, Yarn, Spark, Kafka, Storm, Flume.

Hadoop Management & Security: Hortonworks, Cloudera Manager, Ubuntu.

Web Technologies: HTML, XHTML, XML, XSL, CSS, JavaScript

Server Side Scripting: UNIX Shell Scripting

Database: Oracle 10g, Teradata,Microsoft SQL Server, MySQL, DB2, SQL, RDBMS.

Programming Languages: Java, J2EE, JDBC, JSP, Java Servlets, JUNIT, Python, Scala.

Web Servers: Apache Tomcat 5.x, BEA WebLogic 8.x, IBM WebSphere 6.0/5.1.1

NO SQL Databases: HBase, Mongo DB

OS/Platforms: Mac OS X 10.9.5, Windows, Linux, Unix

Client Side: JavaScript, CSS, HTML, JQuery

SDLC Methodology: Agile (SCRUM), Waterfall.

PROFESSIONAL EXPERIENCE

Confidential, San Francisco, CA

Sr. Hadoop Developer

Responsibilities:

Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Sqoop.
Loaded the load ready files from mainframes toHadoopand files were converted to ASCII format.
Created an Apache Hadoop fully distributed cluster install using HDFS, MapReduce and its various sub-projects, Pig, Hive, Ambari and Oozie.
Worked on major and minor upgrades of Hbase and Cassandra cluster.
Involved in developing Unix scripts for validating source file, creating transformation and load jobs for 4 modules(Ongoing Advice, Advice Details, Case Details, Advice fee payment)
Involved in writing complex SQL queries, Stored Procedures, triggers to access the data from Relational database.
Extensively used Pig for data cleansing. Proficient work experience with NOSQL, Monod databases.
WrittenPython applications to interact with the MySQL database usingSpark SQL Context and also accessed Hive tables using Hive Context
Involved in developing Hive DDLs to create, alter and drop Hive tables.
Installed and configuredHadoopMapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and pre-processing.
Implemented Storm topology with Streaming group to perform real time analytical operations.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale, And databases such as HBase, and MongoDB.
Collaborated with big data partners including: Cloudera, Hortonworks MapR for Supermicro integrated solutionsInvolved in processing ingested raw data using Map Reduce, Apache Pig and Hive.
Handled importing of data from various data sources, performed transformations using Hive MapReduce, loaded data into Hadoop Distributed File System (HDFS) and extracted the data from MySQL into HDFS vice-versa using Sqoop.
Experience in collecting metrics for Hadoop clusters using Ganglia and Ambari.
Used Python scripts to update the content in database and manipulate files
Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
Worked with NoSQL databases like Hbase and Mongo DB for POC purpose.
Recommendation engine for Portfolio and Research articles using Apache Spark and MongoDB.
Developed Spark SQL scripts and involved in converting hive UDF's to Spark SQL UDF's.
Responsible for batch processing and real time processing in HDFS and NOSQL Databases.
Responsible for retrieval of Data from Casandra and ingestion to PIG.
Experience in customizing map reduce framework at various levels by generating Custom Input formats, Record Readers, Partitioner and Data types.
Experienced with multiple file in HIVE, AVRO, Sequence file formats.
Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig Script.
Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
Involved in hive-Hbase integration by creating hive external tables and specifying storage as Hbase format.
DevelopedSpark scripts by usingPython shell commands as per the requirement
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie.
Installed and configured Hive and also written Hive QL scripts.
Database products: MS-SQL, Hadoop/Apache MapR, Oracle, DB2, Informix Online
Experience with creating ETL jobs to load JSON data and server data into MongoDB and transformed MongoDB into the Data Warehouse.
Created reports and dashboards using structured and unstructured data.
Implemented HBase co-processors, Observers to work as event based analysis.

Environment: Map jobs, Spark SQL, Pig Scripts, ETL, Flume, Kafka, Storm, MapR, Hadoop BI, Pig UDF's, Oozie, AVRO, Hive, Map Reduce, Java, Eclipse, Zookeeper.

Confidential, Denver, CO

Sr. Hadoop Developer

Responsibilities:

Developed data pipeline using Flume, Sqoop, Pig and map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
Worked on analyzingHadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
InstalledHadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
Participated in Development and Implementation of MapR environment.
Used Pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
Implemented POC’s using Apache Kafka, Storm and Spark.
Importing and exporting data into HDFS and Hive using SQOOP.
Experienced in querying data from various servers into MapR-FS.
Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.POC work is going on using Spark and Kafka for real time processing.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Experienced in managing and reviewing theHadoop log files.
Responsible to manage data coming from different sources.
Involved in Unit testing and delivered Unit test plans and results documents.
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Worked on Oozie workflow engine for job scheduling.
Importing and exporting data into MapR-FS and Hive using Sqoop.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
Load the data into Spark RDD and do in memory data Computation to generate the Output response.
Used Spark to migrate MapReduce jobsinto Spark using Scala.
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Expertise in different data Modeling and Data Warehouse design and development.
Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Developed Spark code using scala and Spark-SQL/Streaming for faster testing and processing of data.
Import the data from different sources like HDFS/Hbase into Spark RDD.
Developed a data pipeline using Kafka and Storm to store data into HDFS.
Performed real time analysis on the incoming data.

Environment: MapReduce, HDFS, Hive, Pig, Spark, Spark-Streaming, Spark SQL, MapR, Storm, Apache Kafka, Sqoop, Java, Scala, CDH4, CDH5, AWS, Eclipse, Oracle, Git, Shell Scripting and Cassandra.

Confidential, Santa Clara, CA

Hadoop Developer

Responsibilities:

Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
Responsible for building scalable distributed data solutions using Hadoop.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop.
Implement automated methods and industry best practices for consistent installation and configuration of Greenplum for production and non-production environments
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
Installed and configured Cloudera Manager for easy management of existing Hadoop cluster.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Responsible for managing and reviewing Hadoop log files. Designed and developed data management system using MySQL.
Developed entire frontend and backend modules using Python on Django Web Framework.
Wrote Python scripts to parse XML documents and load the data in database.
Cluster maintenance as well as creation and removal of nodes using tools like Cloudera Manager Enterprise, and other tools.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Little bit hands on Data processing using spark.
Worked on NoSQL databases including HBase and ElasticSearch.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
Used Tableau as reporting tool as data visualization tool
Involved in Java, J2EE, Struts, Web Services and Hibernate in a fast paced development environment.
Followed Agile methodology, interacted directly with the client provide/take feedback on the features, suggest/implement optimal solutions, and tailor application to customer needs.
Setting up proxy rules of applications in Apache server and Creating Spark SQL queries for faster requests.
Designed and Developed database design document and database diagrams based on the Requirements.
Developed UI of Web Service using Struts MVC Framework.
Implemented Struts validation framework.
Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
Implemented Web Service security on JBoss Server.
Implemented DAOs for data access using Spring ORM with Hibernate.
Implemented/optimized complex stored procedures for performance enhancements.
Designed the XML Schema for data transmission using xml documents.

Environment: HDFS, Hive, PIG, UNIX, SQL, Java MapReduce, SPARK Hadoop Cluster, Hbase, Sqoop, Oozie, Linux, Data Pipeline, Greenplum, KAFKA, Python, MySql, Storm, MapRDB.

Confidential

Java Developer

Responsibilities:

Involved in the analysis, design, and development and testing phases of Software Development Lifecycle (SDLC) using agile development methodology.
Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface. Used JBoss for EJB and JTA, for caching and clustering purpose
Presentation components in JSP pages are built using ICE faces tag libraries
Responsible in the deployment of the code on the staging/QA server.
GUI was developed using JSP, AJAX and JavaScript, spring framework. Involved in the Development of Spring Framework Controllers.
Configured the URL mappings and bean classes using Springapp-servlet.xml. Sybase was the database and Mybatis was used.
Integrated Push notifications for Android/IPhone using Javapns and GCM for the application.
Worked with Flied level engineers and teams to make the product more user-friendly. Performed testing for GUI and back end.
Wrote Web Services using SOAP for sending and getting data from the external interface.
Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML.
Involved in development of web interface using JSP, JSTL, Servlets, JavaScript and JDBC for administering and managing users and clients.
Integrated third party custom pickers plugins in the application using JQuery for IPhone/Android web browsers.
Used Design patterns such as Business delegate, Service locator, Model View Controller, Session, DAO
Responsible for the design of customizable headers and footers using Tiles framework of Spring, and also used JdbcTemplate to perform database operations at the server side.

Environment: J2EE, Java, Servlets, JSP, SQL, XML, JavaScript, JSTL Ajax, CSS, Agile Methodology,JAVA multithreading, collections, WebSphere, HTML5, JSP.

Confidential

Java Developer

Responsibilities:

Used JDBC, SQL and PL/SQL programming for storing, retrieving, manipulating the data.
Responsible for creation of the project structure, development of the application with Java, J2EE and management of the code.
Responsible for the Design and management of database in DB2 using Toad tool.
Integrated third party plug-in tool for data tables with dynamic data using jQuery.
Responsible for the deployment of the application on the server using IBM WebSphere and putty.
Developed the application in an Agile environment with the constant changes in the applicationscope and deadlines.
Involved in designing and development of the ecommerce site using JSP, Servlets, EJBs, JavaScript and JDBC.
Involved in client interaction and support for the application testing at the client location.
Used AJAX for interactive user operations and client side validations Used XSL transforms on certain XML data.
Performed an active role in the Integration of various systems present in the application.
Responsible to provide services for the mobile requests based on the user request.
Performed logging of all the debug, error and warning at the code level using log4j.
Involved in the UAT phase and production phase to provide continuous support to the onsite team.
Used HP Quality center tool to actively resolve any bugs logged in any of the testing phases.
Used XML for ORM mapping relations with the java classes and the database.
Developed ANT script for compiling and deployment. Performed unit testing using JUnit.
Used Subversion as the version control system. Extensively used Log4j for logging the log files.

Environment: Java, J2EE, PL/SQL, JSP, HTML, AJAX, Java Script, JDBC, XML, JMS, UML, JUnit.

Confidential

Java Developer

Responsibilities:

Developed the applications using Java, J2EE, Struts, JDBC.
Built applications for scale using JavaScript, NodeJS, and React.JS
Used SOAP UI Pro version for testing the Web Services.
Involved in preparing the High Level and Detail level design of the system using J2EE.
Created struts form beans, action classes, JSPs following Struts framework standards.
Implemented the database connectivity using JDBC with Oracle 9i database as backend.
Involved in the development of underwriting process, which involves communications without side systems using IBM MQ and JMS.
Created a deployment procedure utilizing Jenkins CI to run the unit tests.
Worked with JMS Queues for sending messages in point-to-point mode.
Used PL/SQL stored procedures for applications that needed to execute as part of a scheduling mechanisms.
Developed SOAP based XML web services.
Used JAXB to manipulate XML documents.
Created XML document using STAX XML API to pass the XML structure to Web Services.
Used Rational Clear Case for version control and JUnit for unit testing.
Provided troubleshooting and error handling support in multiple projects.

Environment: JSP1.2, Jasper reports, JMS, XML, SOAP,, JDBC, JavaScript, XML, UML, HTML, JNDI, Apache Tomcat, ANT and JUnit.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

San Francisco, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship