Hadoop Developer Resume
Richardson, TX
SUMMARY
- IT Professional with over 7 years of experience in implementing various technologies across different phases of software development life cycle delivering cutting edge solutions to clients
- Worked extensively as a Hadoop Developer for 3 years implementing Big Data applications using different components such as PIG, HIVE, Map Reduce and Spark, Spark SQL
- More than 4 years of experience in Enterprise Application Development (Back end and Front end) in Java/J2EE technologies
- Experience in working in diverse domains like Healthcare, Telecom and Banking
- Experience in working on Cloudera, Hortonworks and MapR Hadoop distributions
- Expertise in developing custom functionalities to meet business needs by creating UDF s in PIG and HIVE
- Strong knowledge on Hadoop architecture and its daemons such as Name Node, Data Node, Secondary Name Node, Job Tracker, Task Tracker, Yarn
- Expertise in using Talend integrated with Cloudera distribution Hadoop for ETL and Data Lake
- Worked on Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Data Frame, RDDs, Spark YARN
- Experience in developing PIG scripts for data flow and transformation activities
- Expertise in creating Managed tables, External tables and Views in Hive and analyzing data large datasets using HiveQL
- Stored the data in tabular formats using Hive tables and Hive SerDe
- Working knowledge on using different NoSQL databases such as HBase, MongoDB, Cassandra
- Implemented SQOOP for data loading activities from RDBMS (SQL server, Oracle) data sources into HDFS
- Imported data into HDFS from various streaming systems using Spark Streaming, Flume into Big Data Lake
- Good knowledge on using Oozie and ZooKeeper for workflows, scheduling and monitoring activities on the cluster
- Computed indexed views for data exploration using Apache Solr
- Experience in implementing a distributed messaging queue to integrate with Cassandra using Apache Kafka and zookeeper
- Expert in creating and designing data ingest pipelines using technologies such as Spring Integration, Apache Storm - Kafka
- Working knowledge on developing and using Python and Shell Scripting
- Worked on implementing Combiners concepts in Map Reduce to increase efficiency and tune the performance of the jobs
- Expertise in Optimization of Hive tables using optimization techniques like Static Partitioning, Dynamic Partitioning and Bucketing
- Expertise in debugging and resolving issues in batches and scripts on a Hadoop cluster
- Worked on Sequence files, RC files, ORC for data loading to HDFS and Parquet file format for Spark SQL
- Hands on development knowledge with RDBMS, including writing SQL queries, PL/SQL, views, stored procedure, Triggers and Cursors
- Used different frameworks such as Spring, Hibernate, Servlets, JSP, EJB, JMS, JDBC, SOA, Web Services, SOAP, REST
- Experience in using Eclipse, NetBeans and IDLE, Anaconda IDEs for Java and Python programming
- Worked extensively and implemented best practices in different software development lifecycles such as Waterfall and Agile Scrum
- Good knowledge in applications design using Unified Modelling Language (UML), Sequence diagrams, Class diagrams, Data flow diagrams and Entity Relationship diagrams
- Excellent communication skills and interpersonal skills along with a strong motivation for optimum project delivery
- Prime problem solving skills with great ability to organize and prioritize tasks and been a noted team player
TECHNICAL SKILLS
Hadoop Ecosystem: Hadoop 1.x/2.x(Yarn), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Kafka, Spark, Storm, Zookeeper, Oozie, Tez, Impala, Mahout, Solr
Programming Languages: Java, Python, PL/SQL, PIG Latin, HiveQL, Spark SQL, C
Hadoop Distributions: Cloudera, Hortonworks
IDE: Eclipse, NetBeans, IDLE, SQL Developer, Intellij, Anaconda
Relational Databases: SQL server, MySQL, Oracle 10g, 11g
NoSQL Databases: HBase, MongoDB, Cassandra
Web Technologies: HTML5, CSS3, JavaScript, JQuery, AJAX, Servlets, JSP, JSON, XML,JSF
Application Frameworks: Hibernate, Spring, Struts, JMS, EJB, Junit, MRUnit
Web Services: SOAP, REST, WSDL, JAXB, and JAXP
Application Servers: Tomcat, WebLogic, WebSphere
Scripting Languages: Python, Shell
Visualization Tools: Tableau, R, MS Excel
Methodologies: Waterfall, Agile
Operating Systems: Microsoft, Linux, Unix, Ubuntu
PROFESSIONAL EXPERIENCE
Confidential - Richardson, TX
Hadoop Developer
Environment: Cloudera distribution Hadoop, Map Reduce, Hive, Pig, HBase, Sqoop, Flume, Cassandra, Storm, Solr Scala, Spark, Oozie, Kafka, Linux, Java, Tableau, Eclipse, HDFS, PIG, Java (JDK), MySQL
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
- Importing and exporting data into RDBMS and Hive using Sqoop
- Worked on partitioning Hive table both Static and Dynamic partitions as per the need
- Optimized HIVE analytics SQL queries to achieve job performance
- Created and worked Sqoop jobs with incremental load to populate Hive External tables
- Developed Pig scripts in the areas where extensive coding needs to be reduced
- Loaded and transformed large datasets of Structured, Un-structured and semi Structured Data into HDFS
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data
- Used Flume to collect, aggregate, and store the log data from different web servers
- Created HBase tables to store variable data formats of data coming from different portfolios
- Experienced in developing Map Reduce programs to load the data from system generated log file to HBase database
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior for targeted advertising
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
- Design technical solution for real-time analytics for streaming data using Kafka and Spark Streaming
- Helped the team to increase cluster size from 20 nodes to 45 nodes. The configuration for additional data nodes was managed using Puppet
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs
- Designing conceptual model with Spark for performance optimization
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Developed Map Reduce programs to parse the raw data and store the refined data in tables
- Analyzing data with Hive, Pig and Hadoop Streaming
- Worked on creating the Data Model for Cassandra from the current Oracle Data model
- Worked with CQL to execute queries on the data persisting in the Cassandra cluster
- To analyze data migrated to HDFS, used Hive data warehouse tool and developed Hive queries
- Used Tableau for visualizing and to generate reports
Confidential - Dallas, TX
Hadoop Developer
Environment: Cloudera Distribution, Hive, MapReduce, Pig, Impala, Tableau, HDFS, Kafka, SQOOP, Flume, HBase, Oozie, Tableau, Java, AWS
Responsibilities:
- Worked on a Hadoop environment with MapReduce, KAFKA, Sqoop, Oozie, Flume, HBase, Pig, Hive and IMPALA on a multi node cloud environment
- To configure Hadoop environment in cloud through Amazon Web Services (AWS) and to provide a scalable distributed data solution
- Worked on installation of KAFKA on Hadoop cluster and to use it for streaming & cleansing of raw data and have extracted useful information using Hive and stored the results in HBase
- Developed producers for Kafka which compress, and bind many small files into a larger Avro and Sequence files before writing to HDFS to make best use of a Hadoop block size
- Worked on implementing MapReduce Jobs to parse raw weblogs into delimited records and also in handling files in various formats such as JSON, XML, Text formats
- Improved performance on MapReduce Jobs by creating combiners, Partitioning and Distributed Cache
- Exposure in Spark iterative processing
- Created partitioned tables in Hive for best performance and faster querying
- Utilized Sqoop to import data from various database sources into HBase using Sqoop scripts by incremental data loading on transactions of customer's data by date
- Utilized Flume in moving log files generated from various sources into Amazon S3 for processing of data
- Performed extensive data analysis using Hive and Pig
- Created Simple as well as complex results using Hive and have improved performance and reduced query time by creating partitioned tables
- Created workflow in Oozie for Automating tasks of loading data into Amazon S3 and to preprocess using Pig, utilized Oozie for data scrubbing and processing
- Developed scripts and deployed them to pre-process the data before moving to HDFS
- Performed extensive analysis on data with Hive and Pig
- Worked on proof of concept on IMPALA
- Used Synergy for Version control and Clear Quest for creating and recording logs on defects and tasks
Confidential - Jacksonville, FL
Hadoop Developer
Environment: Hortonworks Distribution Hadoop, HDFS, Map Reduce, Hive, Flume, HBase, Sqoop, PIG, Java (JDK 1.6), Eclipse, MySQL and Ubuntu, Zookeeper, Oozie, Apache Kafka, Apache Storm
Responsibilities:
- Developed multiple Map Reduce jobs in java for data cleaning and pre-processing
- Developed simple to complex Map Reduce jobs using Hive and Pig
- Involved in creating Hive tables loading and analyzing data using Hive queries
- Involved in running Hadoop jobs for processing millions of records of text data
- Responsible for managing data from multiple sources
- Implemented best income logic using Pig scripts
- Assisted in exporting analyzed data to relational databases using SQOOP
- Involved in loading data from UNIX file system to HDFS
- Created HBase tables to store different data formats
- Experience in managing and reviewing Hadoop log files
- Export the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team
- Analyzed large amounts of datasets to determine optimal way to aggregate and report on it
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and SQOOP
- Established connections to ingest data in and from HDFS
- Monitored jobs on Informatica monitoring tool
- Fetched data from oracle and written into HDFS
- Used Hive connections to analyze data from Oracle
- Extensive knowledge on debugging Map Reduce programs, Hive UDF's using Eclipse
Confidential - St. Louis, MO
Java Developer
Environment: Java/J2EE, JSP, CSS, HTML, PHP,JavaScript, AJAX, Hibernate, Spring 2.5, XML, Web Services, Oracle 9i
Responsibilities:
- Involved in software development / Production support on web-based front-end applications
- Involved in development of the CSV files using the Data load
- Implemented Data Access Object (DAO) adapter pattern to communicate with Business Layer with Database by using Hibernate Template class
- Responsible for Database Designing and Back End Procedures writing using SQL and PL/SQL in ORACLE database
- Utilized WSDL and SOAP to implement Web Services in order to optimize performance by using remote model applications
- Development of Service Layer forming the business logic of MVC based spring architecture
- Responsible for all aspects of CMS development and administration using OpenText
- Updating and maintaining all CMS content
- CMS development, code customization, and administration of all Web Servers
- Involved in configuration and deployment of front-end application on RAD
- Involved in developing JSP's for graphical user interface
- Developed the UI using JSP, PHP, HTML and JavaScript
- Implemented code for validating the input fields and displaying the error messages
- Performed unit testing using JUnit test cases
Confidential - Minneapolis, MN
SQL/Java Developer
Environment: Java/J2EE, JSP, CSS,JavaScript, AJAX, Hibernate, spring 3.0, XML, Web Services, SOAP, Restful, Maven, Rational Rose, HTML, Log4J, JBoss 4
Responsibilities:
- Analysis and understanding of business requirements
- Developed views and controllers for client and manager modules using Spring MVC 3.0 and Spring Core 3.0
- Business logic is implemented using Spring Core 3.0 and Hibernate
- Data Operations are performed using Spring ORM wiring with Hibernate and Implemented Hibernate Template and criteria API for Querying database
- Developed Exception handling framework and used log4J for logging
- Developed Web Services using XML messages that use SOAP
- Developed Web Services for Payment Transaction and Payment Release
- Developed Restful web Services
- Created WSDL and the SOAP envelope
- Developed and modified database objects as per the requirements
- Involved in Unit integration, bug fixing, acceptance testing with test cases, Code reviews
Confidential
Java Developer
Environment: Java6.0, J2EE, Eclipse IDE, J2EE, JSP2.0, JDBC 3.0, Servlets, JavaScript, Springs, Struts, Ajax, HTML, JQuery, Clear Case, Clear Quest, Windows XP
Responsibilities:
- Worked on requirement analysis and design, interacting with the business teams
- Implemented Database driven Left Navigation Tree Menu for Admin Module using Ajax4 JSF Framework
- Developed Validation frame work to show custom validation on JSF Screens
- Participated in the entire SDLC of the project
- Developed UI screens by using HTML, JSPs, CSS, jQuery, Ajax
- Extensively written CORE JAVA with in application
- Developed business layer using Spring, Hibernate and DAO s
- JavaScript and jQuery validation of JDBC for all database interactions
- Used Code Collaborator for code review
- Creating server-side JAVA architecture using Java Servlets
- Developed and deployed EJB's, Servlets and JSPs on WebLogic Server
- Used MySQL as a database product
- Used Eclipse as the IDE for the development