Hadoop Developer Resume
Bellevue, WA
SUMMARY
- 7+ Years of IT experience in Application Development domain of Java and Big Data.
- Experience in Hadoop And Sub - Modules; HDFS, Map Reduce, Apache Pig, Hive, HBase And Sqoop.
- Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Flume, and Cassandra.
- Experience in analyzing data using HIVEQL and Pig Latin and custom Map Reduce programs in Java.
- Have 2 years of profound experience in Cassandra database
- Hadoop Administrator and developer: Set-up configured and monitored Hadoop cluster. Executed performance tuning of cluster.
- Skilled in managing and reviewing Hadoop log files.
- Expert in importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in loading data to Hive partitions and creating buckets in Hive
- Worked with HBase and Cassandra NoSQL databases.
- Experienced in configuring Flume to stream data into HDFS.
- Familiarity with Hadoop architecture and its components like HDFS, Map Reduce, Job Tracker, Task Tracker, Name Node and Data Node.
- Experienced in Application Development using Java, Hadoop, RDBMS and Linux shell scripting and performance tuning.
- Familiarity with distributed coordination system Zookeeper.
- Excellent knowledge in Java and SQL in application development and deployment.
- In-depth understanding of Data Structures and Algorithms and Optimization.
- Worked with relational databases like MySQL, Oracle and NoSQL databases like HBase and Cassandra
- Well versed with databaseslike MS SQL Servers 2012 and 2008, Oracle 11g/10g/9i, MySQL.
- Versatile experience in utilizing Java tools in business, web and client server environments including Java platform, JSP, Servlets, Java beans and JDBC.
- Expertise in developing the presentation layer components like HTML, CSS, JavaScript, JQuery, XML, JSON, AJAX and D3.
- Experienced in source control repositories viz. SVN, GitHub.
- Experienced in detailed system design using use case analysis, functional analysis, modelling program with class and sequence, activity and state diagrams using UML.
- Worked with Data-Warehouse Architecture and Designing Star Schema, Snow flake Schema, Fact and Dimensional Tables, Physical and Logical Data Modeling.
- Designed Mapping documents for Big Data Application.
TECHNICAL SKILLS
Big Data Eco System: HDFS, Map Reduce, Hive, Pig, HBase, Sqoop,, Cloudera CDH4, CDH5, Hadoop Streaming, ZooKeeper, Oozie, Sqoop and Flume.
No SQL: HBase, MongoDB, Cassandra
Languages: Java/ J2EE, SQL, Shell Scripting, C/C++, Python
Web Technologies: HTML, JavaScript, CSS, XML, Servlets, SOAP, Amazon AWS, Google App Engine
Web/ Application Server: Apache Tomcat Server, LDAP, JBOSS, IIS
Operating system: Windows, Linux and Unix
Frameworks: Springs, MVC, Hibernate, Swings
DBMS / RDBMS: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL
IDE: Eclipse, Microsoft Visual Studio (2008,2012), NetBeans, Spring Tool Suits
Version Control: SVN, CVS and Rational Clear Case Remote Client V7.0.1, GitHub
Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, JUnit, SQL Oracle Developer, WinScp, Tahiti Viewer, Cygwin
PROFESSIONAL EXPERIENCE
Confidential, Bellevue WA
Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Responsible for importing log files from various sources into HDFS using Flume.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Estimated the hardware requirements for NameNode and DataNodes & planning the cluster.
- Created Hive Generic UDF's, UDAF's, UDTF's in python to process business logic that varies based on policy.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Cassandra developer: Set-up configured and optimized the Cassandra cluster. Developed real-time java based application to work along with the Cassandra database.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system
- Worked with NoSQL database Hbase to create tables and store data.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
- Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
- Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java. Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Experience in Upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, MapReduce, Java, JDK 1.5, J2EE 1.4, Struts 1.3, Hive, Pig, Sqoop, Flume, Kafka, Oozie, Hue, Hortonworks, Storm, Zookeeper, AVRO Files, SQL, ETL, Cloudera Manager, MySQL, CSS, MongoDB.
Confidential, Peoria IL
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop and Hadoop stack on a 16 node cluster.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
- Involved in data ingestion into HDFS using Sqoop from variety of sources using the connectors like JDBC and import parameters.
- Analyze large and critical datasets of Global Risk Investment and Treasury Technology (GRITT) Domain using Cloudera, HDFS, Hbase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, & Mahout.
- Designed and implemented MapReduce-based large-scale parallel relation-learning system.
- Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Exported the data from Avro files and indexed the documents in sequence file format.
- Implemented various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in hive, using Compression Codecs where ever necessary
- Implemented test scripts to support test driven development and continuous integration
- Involved in Cassandra Data Modeling and Analysis and CQL(Cassandra Query Language). Install, configure, and operate data integration and analytic tools i.e. Informatica, Chorus, SQLFire, & Gem Fire XD for business needs
- Develop scripts to automate routine DBA tasks (i.e. refresh, backups, vacuuming, etc.)
- Installed and configured Hive and also wrote Hive UDF’s that helped spot market trends.
- Used Hadoop streaming to process terabytes data in XML format.
- Involved in loading data from UNIX file system to HDFS.
- Implemented Fair schedulers on the Job tracker with appropriate parameters to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
- Gained very good business knowledge on different category of products and designs within.
Environment: CDH4 with Hadoop 1.x, HDFS, Pig, Cloudera, Hive, Hbase, Zookeeper, MapReduce, Java, Sqoop, Oozie, Hortonworks, Storm, ETL, CSS, Ambari, Linux, UNIX Shell Scripting and Big Data.
Confidential, Chicago IL
Java/J2EE Developer
Responsibilities:
- Created design documents and reviewed with team in addition to assisting the business analyst / project manager in explanations to line of business.
- Responsible for understanding the scope of the project and requirement gathering.
- Involved in analysis, design, construction and testing of the application
- Developed the web tier using JSP to show account details and summary.
- Designed and developed the UI using JSP, HTML, CSS and JavaScript.
- Utilized JPA for Object/Relational Mapping purposes for transparent persistence onto the SQL Server database.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for JUnit Testing.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Used CVS for version controlling.
- Developed application using Eclipse and used build and deploy tool as Maven.
- Used Log4J to print the logging, debugging, warning, info on the server console.
Environment: Java, J2EE Servlet, JSP, JUnit, AJAX, XML, JavaScript, Log4j, CVS, Maven, Eclipse, Apache Tomcat, and Oracle.
Confidential
Java Developer
Responsibilities:
- Developed web components using JSP, Servlets and JDBC
- Designed tables and indexes
- Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server
- Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements
- Provided quick turn around and resolving issues within the SLA.
- Implemented the presentation layer with HTML, XHTML and JavaScript
- Used EJBs to develop business logic and coded reusable components in Java Beans
- Development of database interaction code to JDBC API making extensive use of SQL
- Query Statements and advanced Prepared Statements.
- Used connection pooling for best optimization using JDBC interface
- Used EJB entity and session beans to implement business logic and session handling and transactions. Developed user-interface using JSP, Servlets, and JavaScript
- Wrote complex SQL queries and stored procedures
- Actively involved in the system testing
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product
Environment: Windows NT 2000/2003, XP, and Windows 7/ 8 C, Java, UNIX, and SQL using TOAD, Microsoft Office Suit, Microsoft project
Confidential
Java Developer
Responsibilities:
- Application was built on MVC architecture with JSP 1.2 acting as presentation layer, Servlets as controller and developed the application using Jakarta Struts 1.1 Framework: developed action classes, form beans and Used Struts Validation Framework for validating front end forms.
- Extensively used XML Web Services for transferring/retrieving data between different providers.
- Developed complete Business tier with Session beans and CMP Entity beans with EJB 2.0 standards using JMS Queue communication in authorization module.
- Designed and implemented Business Delegate, Session Facade and DTO Design Patterns
- Involved in implementing the DAO pattern
- Used JAXB API to bind XML Schema to java classes
- Used the report generation in the databases written in PL/SQL
- Used Maven for building the enterprise application modules
- Used Log4J to monitor the error logs
- Used JUnit for unit testing
- Used SVN for Version control
- Deployed the applications on WebLogic Application Server.
Environment: Struts 1.1, EJB 2.0, Servlets 2.3, JSP 1.2, SQL, XML, XSLT, Web Services, JAXB, SOAP, WSDL, JMS1.1, JavaScript, TDD, JDBC, Oracle 9i, PL/SQL, Log4J, JUnit, WebLogic, Eclipse, Rational XDE, SVN, Linux