Hadoop Developer Resume
Plano, TX
SUMMARY
- Overall 8+ yearsof professional IT experience with 5+ years of experience in analysis, architectural design, prototyping, development, integration and testing of applications using Java/J2EE technologies and 3+ years of experience in Big Data Analytics as Hadoop Developer.
- Developed UML Diagrams for Object Oriented Design: Use Cases, Sequence Diagrams and ClassDiagrams usingVisual Paradigm and Visio.
- Three plus years of experience as hadoop developer with good knowledge in hadoop ecosystem technologies.
- Experience in developing Map Reduce programs using apache hadoop for analyzing the big data as per the requirement.
- Experienced on major hadoopecosystem’s projects such as Pig, Hive, Hbase and monitoring them with HortonworksAmbari.
- Hands on experiencein developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Hands on experience in converting SQLto HiveQLand performance tuning of HiveQL.
- Hands on experience working on NoSQL databases including Hbase, Mongo DB and its integration with hadoop cluster.
- Good working experience using Sqoopto import data into HDFS from RDBMS and vice - versa.
- Good knowledge in using job scheduling and monitoring tools likeOozie and Zoo Keeper.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Hortonworks, Cloudera and AWS.
- Implemented Apache Kafkaand Spark streaming for real time processing of data.
- Extensive experience working in Financial and Insurance industry.
- Working knowledge of database such asOracle9i/10g/11g, Microsoft SQL Server.
- Experience in writing numerous test cases using JUnit framework
- Strong experience in database design, writing complex SQL Queries.
- Experience in Building, Deploying and Integrating with Maven.
- Experience in development of logging standards and mechanism based on Log4J.
- Strong work ethic with desire to succeed and make significant contributions to the organization.
- Worked in Various Software Methodologies such as Agile, Waterfall and Prototyping
- Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, Map Reduce, Yarn, Hive,Sqoop, Pig, HBase,Spark, Kafka, Oozie,Zookeeper
Programming Languages: Java JDK 1.6/1.7 (JDK 6/JDK 7),Pig Latin, Shell Scripting, HTML, SQL
Frameworks: Hortonworks, Cloudera
Operating Systems: UNIX/LINUX, Windows
Databases: NoSQL (HBase, MongoDB),Oracle 11g, Microsoft SQL Server 2008/2012
Tools: Eclipse kepler/juno,TOAD, SQL Developer, ANT,Maven, Visio
PROFESSIONAL EXPERIENCE
Confidential, Plano, TX
Hadoop Developer
Responsibilities:
- Responsible for design and development of Big Data applications using Horton works Hadoop.
- Coordinated with business customers to gather business requirements.
- Importing and exporting data into HDFS from database and vice versa usingSqoop.
- Responsible to manage the data coming from different sources.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, Hbase, Spark and Sqoop.
- Developed ApacheSpark jobs using Scala in test environment for faster data processing and used SparkSQL for querying.
- Migrated HiveQL queries on structured into SparkQL to improve performance.
- Analyzed data using Hadoop components Hive and Pig and created tables in hive for the end users.
- Involved in writing Hive queries and pig scripts for data analysis to meet the business requirements.
- Written Oozieflows and shell scripts to automate the flow.
- Optimized Map Reduce and hive jobs to use HDFS efficiently by using Gzip, LZO,
- Snappy and ORCcompression techniques.
- Tuned Hive table and queries to achieve performance.
- Written algorithms to calculate the most valuable households based on the data provided by external providers.
- Connect hive tables to Trifacta to explore data and for data wrangling.
- Involved in the design of the next phase (Future values household) for data analytics.
- Used SparkSQL to query the hive tables to increase the performance.
Environment: MapReduce, HDFS, Yarn, Hive,HBase, Pig,Sqoop, Spark, Oozie, Java, python.
Confidential, Brooklyn, NY
Hadoop Developer
Responsibilities:
- Responsible for design and development of Big Data applications using Hortonworks Hadoop.
- Responsible for overall data migration and data management from Oracle database to HDFS using Sqoop.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
- Converted SQL queries to HiveQL without any changes in the business logic.
- Used different data formats (Text format and ORC format) while loading the data into HDFS.
- Developed shell scripts to automate Daily running Sqoop jobs.
- Hbase and hive integration using Hbase storage handler for analytics.
- DevelopedUDF's to provide custom hive and pig capabilities.
- Performancetuning of hivequeries and configuration properties.
- Developed Oozieworkflow for scheduling and orchestrating the ETL process.
- Monitored workload, job performance and capacity planning using Ambari.
- Handled direct interaction with hortonworks team for issue management & cluster balancing
- Involved in development of spark streaming application with Kafka to handle near-real time data processing of present data.
- Developed SparkHiveQL code to integrate with hive for analyzing historical data using In-memory computing to speedup report generation.
- Implemented SparkRDDtransformations, actions to migrate Map reduce algorithms.
Environment: MapReduce, HDFS, Yarn, Hive,Sqoop, Kafka, Spark, Oozie, Java, python.
Confidential, Bloomington, IL
Hadoop Developer
Responsibilities:
- Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
- Extensively involved in Design phase and delivered Design documents.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Importing and exporting data into HDFS and Hive using SQOOP.
- Map the Relational Database Architecture to Hadoop's file system and build databases on top of it using Cloudera Impala.
- Migration of huge amounts of data from different databases (i.e. Netezza, Oracle, SQL Server) to Hadoop.
- Developed Spark programs on Scala and java.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experienced in defining job flows.
- Involved in data migration from Oracle database to Mongo DB
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Load and Transform large sets of structured and semi structured data.
- Responsible to manage data coming from different sources.
- Involved in creating Hive Tables, loading data and writing Hive queries.
- Utilized Apache Hadoop environment by Cloudera.
- Created Data model for Hive tables.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
- Involved in Unit testing and delivered Unit test plans and results documents.
Environment: Hadoop, MapReduce, YARN, HDFS, Hive, Pig, Impala, Kafka, Java, SQL, Oracle, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse.
Confidential, Brooklyn, NY
Java/Hadoop Developer
Responsibilities:
- Hands on experience in loading data from UNIX file system to HDFS.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data from HBase through Sqoop and placed in HDFS for further processing.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Involved in creating Hive tables, loading data and running hive queries in those data.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
- Working knowledge in writing Pig’s Load and Store functions.
- Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
- Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Monitored multiple Hadoop clusters environments using Ganglia.
- Developed PIG scripts for the analysis of semi structured data.
- Developed and involved in the industry specific UDF (user defined functions).
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
Environment: Apache Hadoop 1.0.1, MapReduce, HDFS, HBase, Hive, Pig, Oozie, Flume, Java, Eclipse, Sqoop, Ganglia, Hbase.
Confidential
Java Developer
Responsibilities:
- Involved in development of business domain concepts into Use Cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams.
- Implemented various J2EE Design Patterns such as Model-View-Controller, Data Access Object, Business Delegate and Transfer Object.
- Responsible for analysis and design of the application based on MVC Architecture, using open source Struts Framework.
- Wrote stored procedure and used JAVA APIs to call these procedures.
- Developed various test cases such as unit tests, mock tests, and integration tests using the JUNIT.
- Used log4j to perform logging in the applications.
Environment: Java, J2EE, Struts MVC, Tiles, JDBC, JSP, JavaScript, HTML, Spring IOC, Spring AOP, JAX-WS, Ant, Web sphere Application Server, Oracle, JUNIT and Log4j, Eclipse
Confidential
Programmer Analyst
Responsibilities:
- Involved in the design and development phases of Rational Unified Process (RUP)
- Involved in creation of UML diagrams like Class, Activity, and Sequence Diagrams using modeling tools of IBM Rational Rose
- Involved in Bug fixing of various modules that were raised by the Testing teams in the application during the Integration testing phase
- Involved and participated in Code reviews
- Used Log4J logging framework for logging messages
- Used Rational Clear Quest for bug tracking
- Involved in deployment of application on IBM Websphere Application Server
Environment: Java, J2EE, Hibernate, XML, XML Schemas, JSP, HTML, CSS, PL/SQL, JUnit, Log4j.