Hadoop Developer Resume
San Francisco, CA
SUMMARY:
- Having 9+ years of experience on Java, J2EE, Big Data Technologies, working knowledge in Hadoop and its stack including big data analytics and expertise in application Design and Development in various domains with an emphasis on Data warehousing tools using industry accepted methodologies and procedures.
- 3+ years of experience in Hadoop Developer/Administration and four years of java application development.
- Experienced Hadoop Developer, have a strong background with file distribution systems in a big - data arena. Understands the complex processing needs of big data and have experience developing codes and modules to address those needs.
- Experience in using Pig, Hive, Scoop, Oozie, Ambari and Cloudera Manager.
- Experience working with Hive data warehouse system, developing data pipelines, implementing complex business logic and optimizing hive queries.
- Design and implementation of business logic and data processing routes using Apache Camel.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager, Application Master and Map Reduce concepts.
- Experience with complex data processing pipelines, including ETL and data ingestion dealing with unstructured and semi-structured data
- Good knowledge on designing and implementing ETL process to load data from various data sources to HDFS using Flume and Sqoop, performing transformation logic using Hive, Pig and integration with BI tools for visualization/reporting
- Extensively used ETL methodology for supporting of Extract, Transform, and Load environment.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster along with CDH3&4 clusters.
- Experience in installing, configuring, and administrating Hadoop cluster for various distributions like Apache, Cloudera, Map R and Horton works 2.2.
- Very good understanding on NOSQL databases like mongo DB, HBase and Cassandra.
- Knowledgeable of Spark and Scala mainly in framework exploration for transition from Hadoop/Map Reduce to Spark.
- Experience in using Apache Flume for collecting, aggregating and moving large amounts of data from application servers.
- Hands-on experience with message brokers such as Apache Kafka, IBM WebSphere, and RabbitMQ.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
- Analyzed data on Cassandra cluster by running queries for searching, sorting and grouping.
- Knowledge on coding skills Kafka and Spark.
- Knowledge of Data Lakes, Processing with Graph DBs and Scala.
- Hands on experience in RDBMS, and Linux shell scripting
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in analyzing data using HiveQL, Pig Latin and Map Reduce.
- Developed Map Reduce jobs to automate transfer of data from HBase.
- Knowledge in job work-flow scheduling and monitoring tools like oozie and Zookeeper..
- Hands on knowledge of writing code in Scala.
- Expertise in RDBMS like MS SQL Server, MySQL, Greenplum,DB2 and NoSql like Cassandra
- Knowledge of job workflow scheduling, monitoring and of administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig.
- Excellent Java development skills using J2EE, Spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
- Knowledge in Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).
- Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
- Experience in optimizing the queries by creating various clustered, non-clustered indexes and indexed views using and data modeling concepts.
- Experience with Oracle 9i -PL/SQL programming and SQL * Plus.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology,.
- Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS:
Programming Languages: C,C++,Java, Shell Scripting,PL/SQL
J2SE/J2EE Technologies: Java, J2EE, Servlets, JSP, Java Beans, JDBC, Struts 2.0, EJB, Springs, Hibernate, JTA, JMS, Web Services.
IDE s: RAD 6.0, Eclipse 3.1 with My Eclipse 4.1.1, Rational Rose 98/2000,Net beans
Big Data Ecosystem: HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Impala, Kafka, Oozie, Spark,Zookeeper, Flume, Storm AWS, EC2, EMR
Web Technologies: HTML, DHTML, XHTML, CSS, Java Script, My Faces, Rich Faces, JSF, PHP and AJAX.
Monitoring Tools: Ganglia, Nagios, Cloudera Manager, Ambari
Security: Kerberos, Sentry
Xml Technologies: XML, XSL, XQuery, XSD and XSLT.
Operating Systems: HP-UX, LINUX, Windows 9X/2000/XP
Databases: Oracle 10g/9i, MySQL, DB2, MS-SQL Server
Web Servers: Web Sphere 5.1/6.0, Web logic Application server, JBOSS, J2EE Server 1.4, Apache Tomcat 4.1/5.1, IBM HTTP Server.
Methodologies: Unified Modeling Language (UML), Rational Unified Process (RUP), Agile.
PROFESSIONAL EXPERIENCE:
Confidential, San Francisco,CA
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, HBase database and Sqoop.
- Extensively involved in installation and configuration of Cloudera distribution of Hadoop, its Name node, Secondary Name node, Job tracker, Task trackers and Datanodes.
- Designed high level ETL architecture for overall data transfer from the OLTP to OLAP.
- Created various Documents such as Source-To-Target Data mapping Document, Unit Test Cases and Data Migration Document..
- Worked on installing cluster, commissioning & decommissioning of Datanodes, Name node recovery, capacity planning, JVM tuning, map and slots configuration.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Developed Pig Latin scripts to extract the data from the web server output files to load in HDFS.
- Cluster co-ordination service through Zookeeper.
- Created mappings using the transformations like Source Qualifier, Aggregator, Expression, Lookup, Router, Normalizer, Filter, Update Strategy and Joiner transformations.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented a script to transmit sys print information from Oracle to HBase using Sqoop.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Implemented best income logic using Pig scripts and UDFs.
- Designed and implemented Spark test bench application to evaluate quality of recommendations made by the engine.
- Tool monitored log input from several datacenters, via Spark Stream, was analyzed in Apache Storm and data was parsed and saved into Cassandra.
- Implemented Cluster balancing.
- Worked on Statistical Analysis for obtaining the information.
- Migrated high-volume OLTP transactions from Oracle to Cassandra in order to reduce oracle licensing footprint.
- Hands on experience in large scale data processing using Spark.
- Sql, streaming and complex analytics in the company are handled with use of Spark.
- Implemented test scripts to support test driven development and continuous integration.
- Mentored analyst and test team for writing Hive Queries.
- Worked on tuning the performance of Hive and Pig queries.
- Streaming data to Hadoop using Kafka
- Writing java code for custom partitioner and writables
- Experience in Enterprise Integration Development using Apache Camel Framework.
- Implemented Apache Camel routes using Camel-Spring XML and Camel-Spring processor beans.
- Worked on the Analytics Infrastructure team to develop a stream filtering system on top of Apache Kafka
- Designed and developed a decision tree application using Neo4J graph database for fraud visuals
- Worked on to ease the jobs by building the applications on top of Cassandra
- Classification of key words into categories using a Neo4J graph database and make a recommendation.
- Exported data from database to flat files, migrating data or data cleansing with the use of Pentaho.
- Data ingestion to HBase and Hive using Storm bolts.
- Implemented Kafka Storm topologies, which are capable of handling and channelizing high stream of data and integrating the storm topologies with Esper to filter and process that data across multiple clusters for complex event processing.
- Worked on generating reports Neo4J graph database .
- Supported in setting up QA environment and updating configurations for implementing scripts.
- Unit tested and tuned SQLs and ETL Code for better performance.
- Monitored the performance and identified performance bottlenecks in ETL code.
Environment: Informatica Power Center 9.5, Oracle 11g, DB2, Erwin 4.0, Unix Shell Scripting, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Yarn, Storm, Kafka, Linux, Java, Oozie, Spark, SQL, PL/SQL.
Confidential, EL Segundo CA .
Hadoop Developer
Responsibilities:
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Interacted with Business Analysts to understand the requirements and the impact of the ETL on the business.
- Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
- Worked on deploying Hadoop cluster with multiple nodes and different big data analytic tools including Pig, Hbase database and Sqoop.
- Involved in loading data from LINUX file system to HDFS.
- Experience in reviewing Hadoop log files to detect failures.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline. This pipeline is also involved in Amazon Web Services EMR, S3 and RDS.
- Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
- Real streaming the data using Spark with Kafka.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented new Apache Camel routes and extended existing Camel routes that provide end-to-end communications between the web services and other enterprise back end services
- Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union to develop robust mappings in the Informatica Designer.
- Implemented test scripts to support test driven development and continuous integration.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Implementing custom code for map reduce partitioner and custom writables.
- Developed storm-monitoring bolt for validating pump tag values against high-low and highhigh - lowlow values from preloaded metadata.
- Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
- Used in depth features of Tableau like Data Blending from multiple data sources to attain data analysis.
- Implemented agent-server messaging dialog using Camel.
- Implemented data ingestion and handling clusters in real time processing using Kafka.
- Performed benchmarking of the No-SQL databases, Cassandra and HBase
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in mapreduce way.
- Data ingestion to HBase and Hive using Storm Bolts.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Along with the Infrastructure team, I have implemented Kafka-Storm based data pipeline
- Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
- Experience with Core Distributed computing and Data Mining Library using ApacheSpark.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Involved in creating data-models for customer data using Cassandra Query Language
- Supported MapReduce Programs those are running on the cluster.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked on tuning the performance Pig queries.
- Developed analytical component using Scala, Spark and SparkStream.
- Implemented Spark using Scalaand SparkSQL for faster testing and processing of data.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple Mapreduce jobs.
- Provided support to develop the entire warehouse architecture and plan the ETL process.
- Experience in developing Unix Shell Scripts for automation of ETL process
- Worked with application teams to install operating system, Hadoop updates, patches, version u grades as required.
- Worked in collaboration with BI team for creating visual illustrations with pentaho.
- Insurance type necessity analysis with hadoop and accumulo for the customers.
Environment: Hadoop, Informatica Power Center 9.5, Oracle 11g, SQLServer, PL/SQL, XML, MS Access, Toad, UNIX, Toad, Autosys, HDFS,Yarn, Map Reduce,, Hive,Pig,Spark,Storm,Sqoop,Kafka,Linux, Java, Oozie.
Confidential, Foster City, CA
Hadoop Developer
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Designed and Implemented the ETL Process using Informatica power center.
- ETL flows are developed from Source to Stage, Stage to Work tables and Stage to Target Tables.
- Facilitated knowledge transfer sessions.
- Installed and configured Hadoop, MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Designed and presented plan for POC on Apache Storm
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Got good experience with NOSQL database.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Apache Kafka for the durable messaging layer and Apache Storm for the real time transaction scoring
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- Tuned Mappings and Mapplets for best Performance on ETL Side and Created Indexes and Analyzed tables periodically on Database side.
- Designed and implemented Mapreduce-based large-scale parallel relation-learning system.
- Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
- Setup and benchmarked Hadoop/HBase clusters for internal use.
- Setup Hadoop cluster on Amazon EC2 using whirr for POC.
- Wrote recommendation engine using mahout.
Environment: Informatica Power Center 9.1, Oracle 10g, Flat files, TOAD, SQL, PL/SQL, SQL Workbench, Putty, Java 6, Eclipse, Oracle 10g, Sub Version, Hadoop, Hive,Storm,Kafka,HBase, Linux,, MapReduce, HDFS, Hive, Java (JDK 1.6), Hadoop Distribution of HortonWorks and Cloudera, MapReduce, DataStax, IBM DataStage 8.1, Oracle 11g / 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting.
Confidential, Irvine, CA
Hadoop Developer
Responsibilities:
- Involved in requirements analysis and prepared Requirements Specifications document.
- Designed implementation logic for core functionalities.
- Developed service layer logic for core modules using JSPs and Servlets and involved in integration with presentation layer.
- Involved in implementation of presentation layer logic using HTML, CSS, JavaScript and XHTML.
- Designed the front-end applications and user interactive web pages using web technologies likeAngular JS, NodeJS.
- Design of MySQL database to store customer's general and billing details.
- Used JDBC connections to store and retrieve data from the database.
- Development of complex SQL queries and stored procedures to process and store the data.
- Used ANT, a build tool to configure application.
- Developed test cases using JUnit.
- Involved in unit testing and bug fixing.
- Prepared design documents for code developed and defect tracker maintenance.
Environment: Hive, Pig, Sqoop, Oozie, Map Reduce, Cassandra, MongoDB, UNIX, Shell Scripting.
Confidential, Doral, FL
ETL Developer
Responsibilities:
- Responsible for definition, development and testing of processes/programs necessary to extract data from operational databases, Transform and cleanse data, and Load it into data warehouse using Informatica Power center.
- Created the repository manager, users, user groups and their access profiles.
- Extracted data from different sources like SQL server 2005, flat files and loaded into oracle DWH.
- Created complex mappings in Power Center Designer using Expression, Filter, Sequence generator, Update Strategy, Joiner and Stored procedure transformations.
- Created connected and unconnected Lookup transformations to look up the data from the source and target tables.
- Wrote SQL, PL/SQL, stored procedures for implementing business rules and transformations.
- Used data miner to process raw data from flat files.
- Used the update strategy to effectively migrate data from source to target.
- Wrote SQL, PL/SQL, stored procedures for implementing business rules and transformations.
- Responsible for writing SQL queries, stored procedures, views, triggers.
- Developed and documented Mappings/Transformations, Audit procedures and Informatica sessions.
- Used Informatica Power Center Workflow to create sessions and run the logic embedded in the mappings.
- Created test cases and completed unit, integration and system tests for Data warehouse.
Environment: Informatica Power Center 8.1.1, oracle, Flat files, SQL server 2005, SQL, PL/SQL, Windows 2003.
Confidential,
JAVA Developer
Responsibilities:
- Performed in various phases of the Software Development Life Cycle (SDLC)
- Developed user interfaces using JSP framework with AJAX, Java Script, HTML,XHTML,and CSS
- Performed the design and development of various modules using CBD Navigator Framework
- Deployed J2EE applications in Web sphere application server by building and deploying ear file using ANT script.
- Created tables, stored procedures in SQL for data manipulation and retrieval.
- Used technologies like JSP, JavaScript and Tiles for Presentation tier.
- CVS tool is used for version control of code and project documents
Environment: JSP, Servlets, JDK, JDBC, XML, JavaScript, HTML, Spring MVC, JSF,Oracle, Sun Application Server, UML, JUnit, JTest, Netbeans, Windows 2000.
Confidential .
JAVA Developer
Responsibilities:
- Involved in requirements analysis and prepared Requirements Specifications document.
- Designed implementation logic for core functionalities.
- Developed service layer logic for core modules using JSPs and Servlets and involved in integration with presentation layer.
- Involved in implementation of presentation layer logic using HTML, CSS, JavaScript and XHTML.
- Design of MySQL database to store customer's general and billing details.
- Used JDBC connections to store and retrieve data from the database.
- Development of complex SQL queries and stored procedures to process and store the data.
- Used ANT, a build tool to configure application.
- Developed test cases using JUnit.
- Involved in unit testing and bug fixing.
- Prepared design documents for code developed and defect tracker maintenance.
Environment: Java, J2EE (JSPs & Servlets), JUnit, HTML, CSS, JavaScript, Apache Tomcat, MySQL.