Sr. Hadoop Developer Resume
Austin, Tx
SUMMARY:
- About 9 years of experience in Application analysis, Design, Development, Maintenance and Supporting web, Client - server based applications in Java/J2EE technologies which includes around 6+ years of experience with Big Data and Hadoop related components like HDFS, Map Reduce, Pig, Hive, YARN, Sqoop, Flume, Spark, Scala, and Kafka.
- Experience in multiple Hadoop distributions like Cloudera, and Horton works.
- Excellent understanding of NoSQL databases like HBase, Cassandra and MongoDB.
- Experience on working structuredand unstructured data with various file formats such as Avro data files, xml files, JSON files, sequence files using Map Reduce programs.
- Work experience with cloud configurations like Amazon web services (AWS).
- Implemented custom business logic and performed join optimization, secondary sorting, custom sorting using Map Reduce programs.
- Experienced testing and running of Map Reduce pipelines.
- Expertise in Data ingestion using SQOOP, Apache Kafka, Spark Streaming and FLUME.
- Implemented business logic using Pig scripts. Wrote custom Pig UDF’s to analyze data.
- Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
- Hands on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
- Experience in designing, configuring and installing Datastax Cassandra.
- Good understanding of Conceptual, Logical and Physical Data Modeling.
- Experience with Oozie Workflow Engine to automate and parallelize Hadoop, Map Reduce and Pig jobs.
- Extensive experience with wiring SQL queries using HiveQL to perform analytics on data.
- Experience in performing data validation using HIVE dynamic partitioning and bucketing.
- Experienced in importing and exporting data between RDBMS and Teradata into HDFS using Sqoop.
- Experienced in handling streaming data like web server log data using flume.
- Good knowledge analyzing data using Python development and scripting for Hadoop Streaming.
- Worked with Spark Data Frames, Spark SQL and Spark Mlib extensively.
- Experience in implementing Spark using Scala and SparkSQL for faster processing of data.
- Extensive Hands on experience with Accessing and perform CURD operations against HBase data using Java APIand implementing time series data management.
- Hands-on experience with message broker such as Apache Kafka.
- Employed in planning different stages of migrating data from RDBMS to Cassandra.
- Expertise in benchmarking and load testing a Cassandra cluster Cassandra-stress tool.
- Involved in various datamining tasks such as pattern mining, classification and clustering.
- Experienced in J2EE, Spring, Hibernate, SOAP/Rest web services, JMS, JNDI, EJB .
- Expertise with Application servers and web servers like Oracle Web Logic, IBM Web Sphere, Apache Tomcat, JBOSS and VMware .
- Experienced in developing the unit test cases using MRUnitand JUnit.
- Experience in using Maven and ANT for build automation.
- Experience working in environments using Agile (SCRUM) and Waterfall methodologies.
- Expertise in database modeling, administration and development using SQLand PL/SQL in Oracle (8i, 9i and 10g), MySQL,DB2and SQL Server environments.
TECHNICAL SKILLS:
BigData / Hadoop: HDFS / Map Reduce / Hive / Pig / HBase / YARN / Sqoop / Flume/ Oozie / Scala / Kafka / Apache Spark / Spark Sql / AWS / Talend.
Databases / NoSQL: Cassandra / MongoDB / HBase / Hive / SQL / Pl/SQL / Oracle.
Web Technologies: HTML / CSS / AJAX / JavaScript / JQuery.
Web Services: Soap / Rest / XML / XSD.
J2EE Frameworks: Hibernate / Springs / JMS / JSF.
Operating Systems: Windows / Unix / Linux.
Methodologies: Agile, Waterfall.
Ide s / Tools: Eclipse / NetBeans/ Microsoft Visio.
Build Tools: Maven / Apache- ANT / Log4j.
PROFESSIONAL EXPERIENCE:
Confidential, Austin, TX.
Sr. Hadoop Developer
Responsibilities:
- Designed a pipeline to collect, clean, and prepare data for analysis using Map reduce, Spark, Pig, Hive and HBase and reporting using Tableau.
- Developed and implemented script to send large amount of data to any Http Server, which is configurable in number of users, operations and a range of dates.
- Created reports using Tableau using HiveQL.
- Created/modified UDF and UDAFs for Hive and PIG whenever necessary.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Used Apache Kafka for handling log messages that are handled by multiple systems.
- Worked with Data Staging Validation using Talend.
- Involved with Unit testing and integration testing with Hue
- Worked with Spark Data Frames, Spark SQL and Spark Mlib extensively.
- Worked with Data Science team in developing Spark Mlib applications to develop various predictive models
- Worked with Kafka streaming to fetch the data from real time.
- Hands on experience in importing and exporting data from relational databases to HDFS and vice versa using Sqoop.
- Worked on Impala for creating tables and querying data.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Processed the source data to structured data and stored to Cassandra.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using scala and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in creating Hive tables and loading with data.
Environment: Hortonwork’s HDP, Java, Kafka, Pig, Hive, HDFS, Cassandra, UNIX, Spark, Scala, HBase,HiveQL, AWS, Tableau.
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Worked on analyzing and writing Hadoop MapReduce jobs using API, Pig and Hive.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Involved in benchmarking the Cassandra cluster for performance using Cassandra-stress tool.
- Supported MapReduce Programs those are running on the cluster.
- Installed and configured Pig, experienced in writing Pig Latin scripts.
- Wrote Hive queries for data analysis to meet the business requirements.
- Suggested data modeling performance and tuning techniques.
- Configured internode communication between Cassandra nodes and client using SSL encryption.
- Developed Java Map Reduce programs on mainframe data to transform into structured way.
- Involved in installing Hadoop Ecosystem components.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Developed optimal strategies for distributing the mainframe data over the cluster. Importing and exporting the stored mainframe data into HDFS and Hive.
- Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
- Implemented HBase API to store the data into HBase table from hive tables.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Created Hive tables and working on them using Hive QL.
- Conducted POC for Hadoop and Spark as part of NextGen platform implementation.
- Involved for Cassandra Database Schema design.
- Worked on migrating data from relational databases to Cassandra.
- Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
- Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: CDH4, Java, MapReduce, HDFS, Hive, Spark, Scala, Cassandra,Pig, Linux, XML, MySQL, MySQL Workbench, Cloudera, Maven, Java 6, Eclipse, PL/SQL, SQL connector.
Confidential, Long Beach, CA
Hadoop Developer
Responsibilities:
- Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Developed MapReduce Input format to read specific data format
- Developing and maintaining Workflow Scheduling Jobs in Oozie.
- Used Sqoop to transfer data from external sources to HDFS
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases
- Imported and exported data into HDFS and Hive using Sqoop.
- Experience in loading and transforming huge sets of structured, semi structured and unstructured data.
- Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume.
- Worked on different file formats like XML files, Sequence files, CSV and Map files.
- Continuously monitored and managed Hadoop cluster using Cloudera Manager.
- Performed POC’s using latest technologies like spark, Kafka, scala.
- Created Hive tables, loaded them with data and wrote hive queries.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Experience in managing and reviewing Hadoop log files.
- Analysis of Web logs using Hadoop tools for operational and security related activities.
- Used all complex data types in Pig for handling data.
- Developed efficient Map Reduce programs in java for filtering out the unstructured data.
- Supported Map Reduce Programs those are running on the cluster.
- Managed and reviewed Hadoop log files to identify issues when job fails.
Environment: Hadoop,MapReduce, HDFS, HBase, Hive, Pig Java, XML, SQL, MySql, Scala, Pig, Sqoop, Oozie
Confidential, Cupertino, CA
Hadoop Consultant
Responsibilities:
- Involved in creating Hive tables, loading with data and writing hive queries.
- Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.
- Developed MapReduce programs to parse the raw data, populate tables and store the refined data in partitioned tables.
- Installed and configured Hadoop and Hadoop stack on a 4 node cluster.
- Experienced in managing and reviewing application log files.
- Ingest the application logs into HDFS and processes the logs using map reduce jobs.
- Create and maintain Hive warehouse for Hive analysis.
- Generate test cases for the new MR jobs.
- Lead & Programmed the recommendation logic for various clustering and classification algorithms using JAVA.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing
- Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
- Created HBase tables to store various data formats of personally identifiable information data coming from different portfolios.
- Involved in managing and reviewing Hadoop log files.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
Environment: HDFS, Hive, Scala, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Shell Scripts, Oozie, MySQL, Tableau, Eclipse, Webservices, Oracle11g/SQL, JDBC and Websphere Applications.
Confidential, Fremont, CA
Software Engineer
Responsibilities:
- Worked on Marshalling and Un Marshalling the XML using the JIBX Parser.
- Interpreted and manipulated spring and hibernate configure files.
- Worked on JMS and Messaging Queue (MQ) configurations.
- Designed and developed GUI Screens for user interfaces using JSP, JavaScript, XSLT, AJAX, XML, HTML, CSS, JSON.
- Good in Configure, Design, implement and monitor Kafka Cluster and connectors.
- Generated the Class diagrams, Sequence diagrams extensity for the entire process flow using RAD.
- Consumed external web services by creating service contract through WSRR from different Development centers.
- Worked on SOAP based Web services, tested Web Services using SOAP UI.
- Used Jenkins tool to build the application on the server.
- Developed documentation for QA Environment.
- Loaded the records from Legacy database to Cassandra.
- Synchronized the create, Update and delete of records between Legacy Database and Cassandra.
- Created stored procedures, SQL Statements and triggers for the effective retrieval and storage of data into database.
- Application developed on Agile methodologies scrum and iterative method process.
- Used Apache Log4j logging API to log errors and messages.
- Deployed applications on Unix Environment for Dev, QA-Smoke
- Unit tested the application using JUnits and Easy Mock.
Environment: JDK, Spring Framework, XML, HTML, Cassandra, JSP, Hibernate, ANT, Java Script, XSLT, CSS, AJAX, JMS, SOAP Web Services, Web Sphere Application Server, PL/SQL, Junit, Log4j, Shell scripting, UNIX.
Confidential, Long Beach, CA
Java/J2EE Developer
Responsibilities:
- Developed front-end screens using JSP, HTML and CSS.
- Developed server-side code using Struts and Servlets.
- Developed core java classes for exceptions, utility classes, business delegate, and test cases.
- Developed SQL queries using MySQL and established connectivity.
- Worked with Eclipse using Maven plugin for Eclipse IDE.
- Designed the user interface of the application using HTML5, CSS3, JSF 2.0 JSP and JavaScript.
- Tested the application functionality with JUnit Test Cases.
- Developed all the User Interfaces using JSP and Struts framework.
- Writing Client-Side validations using JavaScript.
- Extensively used JQuery for developing interactive web pages.
- Experience in developing web services for production systems using SOAP and WSDL.
- Developed the user interface presentation screens using HTML, XML, and CSS.
- Experience in working with spring using AOP, IOC and JDBC template.
- Developed the Shell scripts to trigger the Java Batch job, Sending summary email for the batch job status and processing summary.
- The application was developed in Eclipse IDE and was deployed on Tomcat server.
- Supported for bug fixes and functionality change.
Environment: Java, Struts 1.1, Servlets, JSP, HTML, CSS, JavaScript, Eclipse 3.2, Tomcat, Maven, MySQL, Windows and Linux, JUnit.