Hadoop Developer Resume
Palo Alto, CA
OBJECTIVE
- To enhance my career with leading corporate of hi - tech environment with committed & dedicated people, which will help me to explore myself fully and realize my potential.
SUMMARY
- 8 years of professional experience in IT field and 3+ year of experience in Hadoop ecosystem which include real time data processing using storm and spark, deploying multi-node cluster using Cloudera and Hortonwork distributions.
- Good knowledge on Hadoop HDFS, MRv1, MRv2 (YARN) frameworks and Hadoop stack HIVE, HDFS, HBase, Sqoop, Oozie, Pig, Zookeeper, Hue, Flume.
- Proficient in implementing both Fair Scheduler and Capacity scheduler on the cluster as required for maximum cluster utilization.
- Experience in cluster capacity planning, deploying, and performance tuning, administering and monitoring Hadoop ecosystem.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Experience on installation and configuration of Spark standalone mode for Testing and development environments.
- Experience integration of Kafka with Storm and Spark for real time data processing.
- Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka brokers.
- Experience with python and Core Java platform for developing big data ingestion and processing pipeline.
- Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
- Experience in importing data from a relational database management system (RDBMS) such as MySQL and Oracle into the Hadoop Distributed File System (HDFS), and exporting the processed data back into RDBMS.
- Implemented Flume for collecting, aggregating and moving large amount of server logs and streaming data to HDFS.
- Integrating user data from Cassandra to data in HDFS. Integrating Cassandra with Storm for real time user attributes look up.
- Good knowledge on installation and administration of Cassandra, MongoDB and HBase.
- Integrated Storm with MongoDB to load the processed data directly to the MongoDB.
- Good Knowledge in analyzing data using Pig Latin, Hive QL.
- Good Knowledge on software development life cycle (SDLC).
- Experience in indexing the data stored in the HDFS using Solr for searching.
- Experience in developing MVC architecture using Servlets, JSP, Struts Framework, Hibernate Framework and Spring Framework.
- Experience in using Java IDE tools like IBM WebSphere Studio Application Developer (WSAD), Rational Application Developer, Eclipse and familiar with other IDE's like Net Beans, JBuilder, and JDeveloper.
- Good knowledge of Data Modeling, Data Warehousing concepts, OLAP Concepts, Star Schema, Snowflake Schema, and Entity-Relationship Diagrams.
- Extensive work experience in ETL processes consisting of data sourcing, data transformation, mapping and loading of data from multiple source systems into Data Warehouse using Informatica Power Center.
- Proficient using version control tools like SVN and GIT.
TECHNICAL SKILLS
Hadoop Ecosystem: HDFS, Hive, Tez, Pig, Flume, Oozie, Zookeeper, HBase, Sqoop, Kafka, Storm, Solr, Spark.
Languages: C, C++, Java, Python, MATLAB, Scala, PIG Latin, Linux, shell scripting, Hive QL.
Operating System: Linux, Windows XP, Windows 7.
Databases: MySQL, Oracle, SQL Server.
NoSQL: Cassandra, MongoDB, HBase.
Automation: Puppet, Shell Scripting.
Web Development: HTML, CSS, JavaScript, R Shiny.
Build Tools: Ant, Maven.
Version Control: SVN and GIT.
Cluster Management and Monitoring: Cloudera Manager, Ambari, Ganglia, Nagios.
Security: Kerberos, LDAP.
PROFESSIONAL EXPERIENCE
Confidential, Palo Alto CA
Hadoop Developer
Responsibilities:
- Integrated Kafka with Spark for real time data processing and store the processed data directly to MongoDB and HDFS.
- Populated HDFS with huge amounts of data using Apache Kafka.
- Imported data from different sources into Spark RDD for processing.
- Developed Spark scripts by using Scala and Python Shell commands as per the requirement.
- Experience in managing and reviewingHadooplog files.
- Experience in hive partitioning, bucketing and perform joins on hive tables and implementing hive SerDe like REGEX, JSON and Avro.
- Optimized HIVE analytics SQL queries, Created tables/views, written custom UDFs and Hive based exception processing.
- Involved in transforming the Teradata to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
- Experience in running hive queries using Tez.
- Experience in using DISTCP command to move data between the two different clusters.
- Experience with ETL process.
- Configured Fair Scheduler to provide fair resources to all the applications across the cluster.
Environment: Hortonworks Hadoop, Ambari, Spark, Solr, Kafka, MongoDB, Linux, HDFS, Hive, Pig, Sqoop, Flume, Zookeeper, RDBM.
Confidential, Chicago, IL
Hadoop Support Analyst
Responsibilities:
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the HDP cluster using Ambari.
- Developed custom MapReduce programs and custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
- Worked with big data Analysts, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig, and Flume etc.
- Involved in HBase setup and storing data into HBase, which was used for further analysis.
- Deployed remote Hive Metastore using MySQL.
- Implemented HA of HDFS to avoid single point of failure (SPOF) using Quorum Journal Manager (QJM).
- Adding and removing of nodes.
- Imported weblogs from the web servers into HDFS using Flume.
- Worked on analyzing data with Hive and Pig.
- Performed a POC on Sqoop imports from heterogeneous data sources to HDFS.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Automated jobs like start, stop, suspend, resume and rerun using Oozie.
- Custom shell scripts for automating redundant tasks on the cluster.
Environment: Hortonworks Hadoop, Ambari, MongoDB, Linux, HDFS, Hive, Pig, Sqoop, Flume, Zookeeper, RDBMS, Oozie.
Confidential, NJ
Hadoop Consultant
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
- Created POC to store Server Log data in MongoDB to identify System Alert Metrics.
- Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Wrote MapReduce jobs using Java API and Pig Latin.
- Used Flume to collect, aggregate and store the web log data onto HDFS.
- Wrote Pig scripts to run ETL jobs on the data in HDFS.
- Used Hive to do analysis on the data and identify different correlations.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Written Hive queries for data analysis to meet the business requirements.
- Involved in creating Hive tables and working on them using Hive QL.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Flume, Zookeeper, Cloudera, Oozie, Java, Oracle, PL/SQL, SQL, Windows NT, UNIX Shell Scripting.
Confidential
Java Developer
Responsibilities:
- Involved in the design and implementation of the architecture for the project using OOAD, UML design patterns.
- Involved in design and development of server side layer using XML, JSP, JDBC, JNDI, EJB and DAO patterns using eclipse IDE.
- Work involved extensive usage of HTML, CSS, JavaScript and Ajax for client side development and validations.
- Used parsers for the conversion of XML files to java objects and vice versa.
- Developed screens using XML documents and XSL.
- Developed Client programs for consuming the Web services published by the Country Defaults Department which keeps in track of the information regarding life span, inflation rates, retirement age, etc. using Apache Axis.
- Developed java beans and jsp's by using spring and JSTL tag libs for supplements.
- Development of EJB’s, Servlets and JSP files for implementing Business rules and Security options using IBM Web Sphere.
- Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using SQL Server, Oracle and DB2.
- Trained end users on developed application.
Environment: Java, JSF Framework, Eclipse IDE, Ajax, Apache Axis, OOAD, Web Logic, Java script, HTML, XML, CSS, SQL Server, Oracle, Web services, Ajax, Spring, OOAD and UML, Windows.
Confidential
Java Developer
Responsibilities:
- Participated in requirement gathering and converting the requirements into technical specifications.
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Involved in design of JSP’s and Servlets for navigation among the modules.
- Developed various EJBs for handling business logic and data manipulations from database.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes.
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
- Built and deployed Java applications into multiple UNIX based environments and produced both unit and functional test results along with release notes.
- Developed the presentation layer using CSS and HTML taken from bootstrap to develop for browsers.
Environment: Java, Spring, Jsp, Hibernate, XML, HTML, JavaScript, JDBC, CSS, SOAP Web services.
