Sr. Hadoop Developer Resume
San Francisco, CA
SUMMARY
- Over 8 years of professional experience in systems analysis, software development, and training.
- Experience in Hadoop/Big Data Technologies, Expertise inHadoop echo systems HDFS, Map Reduce Programming, Sqoop, Pig, Hive, Oozie, Flume, impala and HBase for scalability, distributed computing and high performance computing.
- Hands on experience working with Hadoop, HDFS, Map Reduce framework and Hadoop ecosystem like Hive, HBase, KAFKA,Sqoop and Oozie.
- Experience in installing, configuring and administrating Hadoop cluster for distributions like Cloudera, Horton works and MapR Hadoop distributions.
- Experience in Importing and Exporting the Data using SQOOP from HDFS to Relational Database systems.
- Experience in NoSQL Column - Oriented Databases like HBase and its Integration with Hadoop cluster
- Strong Experience in Linux administration.
- Knowledge on Kafka, Storm. And hands on experience in Spark.
- Integrated Splunk with Hadoop and setup jobs to export data from and to Splunk.
- Spark is a data-processing tool that operates on those distributed data collections.
- Hands on experience in Scala for working with Spark Core and Spark Streaming.
- Good experience on scripting languages like PYTHON, SCALA.
- Worked on Oozie to manage data processing jobs for Hadoop.
- Hands on experience in gathering information from different nodes into Greenplum database and then Sqoop incremental load into HDFS.
- Good knowledge about Map-Reduce framework which includes MR daemons, sortingand shuffle phase, task execution.
- Experience in strong and analyzing data using HiveQL, Pig Latin, SparkQL and custom MapReduce programs in Java.
- Experience in analyzing data using HiveQL, PIG Latin, and custom MapReduce programs in JAVA, and well versed in Core Java.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experienced in working with various kinds of data sources such as Teradata and Oracle and successfully loaded files to HDFS
- Experience in writing and testing Map-Reduce programs to structure the data.
- Experience with Oozie Workflow Engine to automate and parallelize Hadoop MapReduce and Spark jobs.
- Well versed in scheduling Oozie jobs both sequentially and parallel.
- Good experience with MapReduce performance optimization techniques for effective utilization of cluster resources.
- Experience working with MapRvolumes and snapshots for data redundancy.
- Good level of experience in Core Java, JEE technologies as JDBC, Servlets, and JSP.
- Knowledge of custom Map Reduce programs in JAVA.
- Experience in creating custom Solr Query components.
- Extensive experience in developing the SOA middleware based out of Fuse ESB and Mule ESB, Configured, Elastic Search logstash, kibana to monitor spring batch jobs.
- Working knowledge on HTML5 and expert level proficiency in markup and scripting languages such as HTML, DHTML, XML, CSS, JavaScript, JQuery.
- Expertise in using various Hadoop infrastructures such as MapReduce, Pig, Hive, HBase, Sqoop, Oozie, Flume.
- Configured different topologies for Storm cluster and deployed them on regular basis.
- Experienced in implementing unified data platform to get data from different data sources using Apache Kafka brokers, cluster, Java producers and Consumers.
- Experienced in implementing complex algorithms on semi/unstructured data using Map reduce programs.
- Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
- Experienced in migrating ETL kind of operations using Pig transformations, operations and UDF's.
- Experienced in migrating ETL kind of operations using Pig transformations, operations and UDF's.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Specialization in Data Ingestion, Processing, Development from Various RDBMS data sources into a Hadoop Cluster using Map Reduce/Pig/Hive/Sqoop.
- Excellent understanding and knowledge of NOSQL databases like HBase, Cassandra, Mongo DB, Teradata and on Data warehouse.
TECHNICAL SKILLS
Hadoop Ecosystem: Hadoop 2.2, HDFS, MapReduce, Sqoop, Hive, Pig, Impala, Oozie, Yarn, Spark, Kafka, Storm, Flume.
Hadoop Management & Security: Hortonworks, Cloudera Manager, Ubuntu.
Web Technologies: HTML, XHTML, XML, XSL, CSS, JavaScript
Server Side Scripting: UNIX Shell Scripting
Database: Oracle 10g, Teradata,Microsoft SQL Server, MySQL, DB2, SQL, RDBMS.
Programming Languages: Java, J2EE, JDBC, JSP, Java Servlets, JUNIT, Python, Scala.
Web Servers: Apache Tomcat 5.x, BEA WebLogic 8.x, IBM WebSphere 6.0/5.1.1
NO SQL Databases: HBase, Mongo DB
OS/Platforms: Mac OS X 10.9.5, Windows, Linux, Unix
Client Side: JavaScript, CSS, HTML, JQuery
SDLC Methodology: Agile (SCRUM), Waterfall.
PROFESSIONAL EXPERIENCE
Confidential, San Francisco, CA
Sr. Hadoop Developer
Responsibilities:
- Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
- Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Sqoop.
- Loaded the load ready files from mainframes toHadoopand files were converted to ASCII format.
- Created an Apache Hadoop fully distributed cluster install using HDFS, MapReduce and its various sub-projects, Pig, Hive, Ambari and Oozie.
- Worked on major and minor upgrades of Hbase and Cassandra cluster.
- Involved in developing Unix scripts for validating source file, creating transformation and load jobs for 4 modules(Ongoing Advice, Advice Details, Case Details, Advice fee payment)
- Involved in writing complex SQL queries, Stored Procedures, triggers to access the data from Relational database.
- Extensively used Pig for data cleansing. Proficient work experience with NOSQL, Monod databases.
- WrittenPython applications to interact with the MySQL database usingSpark SQL Context and also accessed Hive tables using Hive Context
- Involved in developing Hive DDLs to create, alter and drop Hive tables.
- Installed and configuredHadoopMapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Implemented Storm topology with Streaming group to perform real time analytical operations.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale, And databases such as HBase, and MongoDB.
- Collaborated with big data partners including: Cloudera, Hortonworks MapR for Supermicro integrated solutionsInvolved in processing ingested raw data using Map Reduce, Apache Pig and Hive.
- Handled importing of data from various data sources, performed transformations using Hive MapReduce, loaded data into Hadoop Distributed File System (HDFS) and extracted the data from MySQL into HDFS vice-versa using Sqoop.
- Experience in collecting metrics for Hadoop clusters using Ganglia and Ambari.
- Used Python scripts to update the content in database and manipulate files
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Worked with NoSQL databases like Hbase and Mongo DB for POC purpose.
- Recommendation engine for Portfolio and Research articles using Apache Spark and MongoDB.
- Developed Spark SQL scripts and involved in converting hive UDF's to Spark SQL UDF's.
- Responsible for batch processing and real time processing in HDFS and NOSQL Databases.
- Responsible for retrieval of Data from Casandra and ingestion to PIG.
- Experience in customizing map reduce framework at various levels by generating Custom Input formats, Record Readers, Partitioner and Data types.
- Experienced with multiple file in HIVE, AVRO, Sequence file formats.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig Script.
- Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
- Involved in hive-Hbase integration by creating hive external tables and specifying storage as Hbase format.
- DevelopedSpark scripts by usingPython shell commands as per the requirement
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie.
- Installed and configured Hive and also written Hive QL scripts.
- Database products: MS-SQL, Hadoop/Apache MapR, Oracle, DB2, Informix Online
- Experience with creating ETL jobs to load JSON data and server data into MongoDB and transformed MongoDB into the Data Warehouse.
- Created reports and dashboards using structured and unstructured data.
- Implemented HBase co-processors, Observers to work as event based analysis.
Environment: Map jobs, Spark SQL, Pig Scripts, ETL, Flume, Kafka, Storm, MapR, Hadoop BI, Pig UDF's, Oozie, AVRO, Hive, Map Reduce, Java, Eclipse, Zookeeper.
Confidential, Denver, CO
Sr. Hadoop Developer
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Worked on analyzingHadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
- InstalledHadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Participated in Development and Implementation of MapR environment.
- Used Pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
- Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Implemented POC’s using Apache Kafka, Storm and Spark.
- Importing and exporting data into HDFS and Hive using SQOOP.
- Experienced in querying data from various servers into MapR-FS.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.POC work is going on using Spark and Kafka for real time processing.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experienced in managing and reviewing theHadoop log files.
- Responsible to manage data coming from different sources.
- Involved in Unit testing and delivered Unit test plans and results documents.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
- Importing and exporting data into MapR-FS and Hive using Sqoop.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Used Spark to migrate MapReduce jobsinto Spark using Scala.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Expertise in different data Modeling and Data Warehouse design and development.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using scala and Spark-SQL/Streaming for faster testing and processing of data.
- Import the data from different sources like HDFS/Hbase into Spark RDD.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
Environment: MapReduce, HDFS, Hive, Pig, Spark, Spark-Streaming, Spark SQL, MapR, Storm, Apache Kafka, Sqoop, Java, Scala, CDH4, CDH5, AWS, Eclipse, Oracle, Git, Shell Scripting and Cassandra.
Confidential, Santa Clara, CA
Hadoop Developer
Responsibilities:
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop.
- Implement automated methods and industry best practices for consistent installation and configuration of Greenplum for production and non-production environments
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Responsible for managing and reviewing Hadoop log files. Designed and developed data management system using MySQL.
- Developed entire frontend and backend modules using Python on Django Web Framework.
- Wrote Python scripts to parse XML documents and load the data in database.
- Cluster maintenance as well as creation and removal of nodes using tools like Cloudera Manager Enterprise, and other tools.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Little bit hands on Data processing using spark.
- Worked on NoSQL databases including HBase and ElasticSearch.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Used Tableau as reporting tool as data visualization tool
- Involved in Java, J2EE, Struts, Web Services and Hibernate in a fast paced development environment.
- Followed Agile methodology, interacted directly with the client provide/take feedback on the features, suggest/implement optimal solutions, and tailor application to customer needs.
- Setting up proxy rules of applications in Apache server and Creating Spark SQL queries for faster requests.
- Designed and Developed database design document and database diagrams based on the Requirements.
- Developed UI of Web Service using Struts MVC Framework.
- Implemented Struts validation framework.
- Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Implemented Web Service security on JBoss Server.
- Implemented DAOs for data access using Spring ORM with Hibernate.
- Implemented/optimized complex stored procedures for performance enhancements.
- Designed the XML Schema for data transmission using xml documents.
Environment: HDFS, Hive, PIG, UNIX, SQL, Java MapReduce, SPARK Hadoop Cluster, Hbase, Sqoop, Oozie, Linux, Data Pipeline, Greenplum, KAFKA, Python, MySql, Storm, MapRDB.
Confidential
Java Developer
Responsibilities:
- Involved in the analysis, design, and development and testing phases of Software Development Lifecycle (SDLC) using agile development methodology.
- Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface. Used JBoss for EJB and JTA, for caching and clustering purpose
- Presentation components in JSP pages are built using ICE faces tag libraries
- Responsible in the deployment of the code on the staging/QA server.
- GUI was developed using JSP, AJAX and JavaScript, spring framework. Involved in the Development of Spring Framework Controllers.
- Configured the URL mappings and bean classes using Springapp-servlet.xml. Sybase was the database and Mybatis was used.
- Integrated Push notifications for Android/IPhone using Javapns and GCM for the application.
- Worked with Flied level engineers and teams to make the product more user-friendly. Performed testing for GUI and back end.
- Wrote Web Services using SOAP for sending and getting data from the external interface.
- Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML.
- Involved in development of web interface using JSP, JSTL, Servlets, JavaScript and JDBC for administering and managing users and clients.
- Integrated third party custom pickers plugins in the application using JQuery for IPhone/Android web browsers.
- Used Design patterns such as Business delegate, Service locator, Model View Controller, Session, DAO
- Responsible for the design of customizable headers and footers using Tiles framework of Spring, and also used JdbcTemplate to perform database operations at the server side.
Environment: J2EE, Java, Servlets, JSP, SQL, XML, JavaScript, JSTL Ajax, CSS, Agile Methodology,JAVA multithreading, collections, WebSphere, HTML5, JSP.
Confidential
Java Developer
Responsibilities:
- Used JDBC, SQL and PL/SQL programming for storing, retrieving, manipulating the data.
- Responsible for creation of the project structure, development of the application with Java, J2EE and management of the code.
- Responsible for the Design and management of database in DB2 using Toad tool.
- Integrated third party plug-in tool for data tables with dynamic data using jQuery.
- Responsible for the deployment of the application on the server using IBM WebSphere and putty.
- Developed the application in an Agile environment with the constant changes in the applicationscope and deadlines.
- Involved in designing and development of the ecommerce site using JSP, Servlets, EJBs, JavaScript and JDBC.
- Involved in client interaction and support for the application testing at the client location.
- Used AJAX for interactive user operations and client side validations Used XSL transforms on certain XML data.
- Performed an active role in the Integration of various systems present in the application.
- Responsible to provide services for the mobile requests based on the user request.
- Performed logging of all the debug, error and warning at the code level using log4j.
- Involved in the UAT phase and production phase to provide continuous support to the onsite team.
- Used HP Quality center tool to actively resolve any bugs logged in any of the testing phases.
- Used XML for ORM mapping relations with the java classes and the database.
- Developed ANT script for compiling and deployment. Performed unit testing using JUnit.
- Used Subversion as the version control system. Extensively used Log4j for logging the log files.
Environment: Java, J2EE, PL/SQL, JSP, HTML, AJAX, Java Script, JDBC, XML, JMS, UML, JUnit.
Confidential
Java Developer
Responsibilities:
- Developed the applications using Java, J2EE, Struts, JDBC.
- Built applications for scale using JavaScript, NodeJS, and React.JS
- Used SOAP UI Pro version for testing the Web Services.
- Involved in preparing the High Level and Detail level design of the system using J2EE.
- Created struts form beans, action classes, JSPs following Struts framework standards.
- Implemented the database connectivity using JDBC with Oracle 9i database as backend.
- Involved in the development of underwriting process, which involves communications without side systems using IBM MQ and JMS.
- Created a deployment procedure utilizing Jenkins CI to run the unit tests.
- Worked with JMS Queues for sending messages in point-to-point mode.
- Used PL/SQL stored procedures for applications that needed to execute as part of a scheduling mechanisms.
- Developed SOAP based XML web services.
- Used JAXB to manipulate XML documents.
- Created XML document using STAX XML API to pass the XML structure to Web Services.
- Used Rational Clear Case for version control and JUnit for unit testing.
- Provided troubleshooting and error handling support in multiple projects.
Environment: JSP1.2, Jasper reports, JMS, XML, SOAP,, JDBC, JavaScript, XML, UML, HTML, JNDI, Apache Tomcat, ANT and JUnit.