We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Columbia South, CarolinA

SUMMARY:

  • Overall 7 years of experience in design and deployment of Enterprise Application Development, Web Applications, Client - Server Technologies, Web Programming using Java and Big data technologies.
  • Possesses 4+ years of comprehensive experience as a Hadoop, Bigdata & Analytics Developer.
  • Expertise on Hadoop architecture and ecosystem such as HDFS, Hive, Sqoop Flume and Oozie.
  • Complete Understanding on Hadoop daemons such as Job Tracker, Task Tracker, Name Node, Data Node and MRV1 and YARN architecture.
  • Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache, Cloudera and AWS.
  • Experience in Installation and Configuring Hadoop Stack elements Hive, Sqoop, Flume, Oozie MapReduce, HDFS, and Zookeeper.
  • Experience in data process and analysis using Map Reduce, HiveQL
  • Extensive experience in Writing User Defined Functions (UDFs) in Hive and Pig.
  • Worked on Apache Sqoop to perform importing and exporting data from HDFS to RDBMS/NoSQL DBs and vice-versa.
  • Worked with NoSQL databases such as HBase, and MongoDB.
  • Exposure to search, cache, and analytics data solutions such as Solr, Cassandra and Hive.
  • Experience in job workflow scheduling and Job Designer with the help of Oozie.
  • Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data and Machine Learning Concepts.
  • Worked extensively over semi-structured data (fixed length & delimited files), for data sanitation, report generation and standardization.
  • Experienced in monitoring Hadoop cluster using Cloudera Manager and Web UI.
  • Extensive Experience working on web technologies like HTML, CSS, XML, JSON, JQuery
  • Experienced with build tools Maven, ANT and continuous integrations like Jenkins.
  • Extensive experience in documenting requirements, functional specifications and technical specifications.
  • Extensive experience with SQL, PL/SQL and database concepts.
  • Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA to track issues and crucible for code reviews.
  • Strong Database background with Oracle, PL/SQL, Stored Procedures, trigger, SQL Server, MySQL, and DB2.
  • Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
  • Good Team Player, Strong Interpersonal, Organizational and Communication skills combined with Self-Motivation, Initiative and Project Management Attributes.
  • Holds strong ability to handle multiple priorities and workload and also has ability to understand and adapt to new technologies and environments faster.

TECHNICAL SKILLS:

Hadoop Core Services: HDFS, Map Reduce, Spark, YARN.

Hadoop Distribution: Horton works, Cloudera, Apache.

NO SQL Databases: HBase, Cassandra.

Hadoop Data Services: Hive, Impala, Sqoop, Flume, and Kafka.

Services: Zookeeper, Oozie.

Monitoring Tools: Cloudera Manager, Ganglia.

Cloud Computing Tools: Amazon AWS.

Languages: C, Java, Scala, Python, SQL, PL/SQL, HiveQL, Unix, Java Script, Shell Scripting.

Java & J2EE Technologies: Core Java, Servlets, Spring.

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Databases: Oracle, MySQL, Postgress, Teradata.

Operating Systems: UNIX, Windows, LINUX.

Build Tools: Maven, ANT.

Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans.

Development Methodologies: Agile/Scrum, Waterfall.

PROFESSIONAL EXPERIENCE:

Confidential, Columbia, South Carolina

Hadoop developer

Responsibilities:

  • Used Spark API over Cloudera to perform analytics on data
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed simple and complex Map Reduce programs in Java for Data Analysis on different data formats.
  • Created hive schemas using performance techniques like partitioning and bucketing.
  • Involved in complete end to end code deployment process in Production
  • Developed Map Reduce programs that filter bad and un-necessary records and find out unique records based on different criteria.
  • Involved in some product business and functional requirement through gathering team, update the user comments in JIRA and documentation in Confluence.
  • Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
  • Responsible for performing extensive data validation using Hive.
  • Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS.
  • Worked in tuning Hive scripts to improve performance.
  • Involved in submitting and tracking Map Reduce jobs using JobTracker.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
  • Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented test scripts to support test driven development and continuous integration.
  • Junit framework was used to perform unit and integration testing.
  • Configured build scripts for multi module projects with Maven.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, CDH4, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, Java, Kafka, Linux, Scala, Maven, Java Scripting, Oracle 11g/10g, SVN, Ganglia.

Confidential, Dublin, Ohio

Spark Developer

Responsibilities:

  • Installed and Setup Hadoop CDH clusters for development and production environment.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Installed and configured Hive, Pig, Sqoop, Flume, Cloudera manager and Oozie on the Hadoop cluster.
  • Planning for production cluster hardware and software installation on production cluster and communicating with multiple teams to get it done.
  • It also means that OpenStack has the benefit of thousands of developers all over the world working in tandem to develop the strongest, most robust, and most secure product that they can.
  • Monitored multiple Hadoop clusters environments using Hortonworks. Monitored workload, job performance and collected metrics for Hadoop cluster when required.
  • Installed Hadoop patches, updates and version upgrades when required
  • Installed and configured Cloudera Manager, Hive, Sqoop and Oozie on the CDH4 cluster.
  • Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Namenode utilizing zookeeper services.
  • Performed an upgrade in development environment from CDH 4.2 to CDH 4.6.
  • Worked with big data developers, designers and scientists in troubleshooting map reduce, hive jobs and tuned them to give high performance.
  • Automated end to end workflow from Data preparation to presentation layer for Artist Dashboard project using Shell Scripting.
  • Provide input into Product Management to influence feature requirements for compute, and networking in VMware cloud offering.
  • Developed Map reduce program which were used to extract and transform the data sets and result dataset were loaded to Cassandra.
  • Orchestrated Sqoop scripts, hive queries using Oozie workflows and sub-workflows
  • Involved in loading the created files into MongoDB for faster access of large customer base without taking performance hit.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Performed data analytics in Hive and then exported this metrics back to Oracle Database using Sqoop.
  • Involved in Minor and Major Release work activities.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.

Environment: Cloudera Hadoop, MapReduce, HDFS, Hortonworks, Cloudera Manager, Hive, Pig, Sqoop, Oozie, Flume, Linux, Zookeeper, LDAP.

Confidential, Nashville, TN

Big Data Engineer

Responsibilities:

  • Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, HBase, Sqoop, Flume, Oozie and Zookeeper.
  • Implemented six nodes CDH4 Hadoop Cluster on CentOS.
  • Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop.
  • Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie.
  • Importing log files using Flume into HDFS and load into Hive tables to query data.
  • Monitoring the running Map Reduce programs on the cluster.
  • Responsible for loading data from UNIX file systems to HDFS.
  • Used HBase-Hive integration, written multiple Hive UDFs for complex queries.
  • Involved in writing APIs to Read HBase tables cleanse data and write to another HBase table.
  • Created multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
  • Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Experienced in writing programs using HBase Client API.
  • Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
  • Experienced in design, development, tuning and maintenance of NoSQL database.
  • Written Map Reduce program in Python with the Hadoop streaming API.
  • Developed unit test cases for Hadoop Map Reduce jobs with MRUnit.
  • Excellent experience in ETL analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of database.
  • Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Used Maven as the build tool and SVN for code management.
  • Worked on writing RESTful web services for the application.
  • Implemented testing scripts to support test driven development and continuous integration.

Environment: Hadoop, Map Reduce, HDFS, HBase, Hive, Impala, Pig, Java, SQL, Ganglia, Scoop, Flume, Oozie, Unix, Java, Java Script, Maven, Eclipse.

Confidential, Deerfield, IL

Hadoop Developer

Responsibilities:

  • Imported Data from Different Relational Data Sources like RDBMS, Teradata to HDFS using Sqoop.
  • Worked on writing transformer/mapping Map-Reduce pipelines using Apache Crunch and Java.
  • Imported Bulk Data into Cassandra file system Using Thrift API.
  • Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for Map Reduce computations to handle custom business requirements
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Perform analytics on Time Series Data exists in Cassandra using Java API
  • Designed and implemented Incremental Imports into Hive tables.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Experienced in managing and reviewing the Hadoop log files.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files usingMap ReducePrograms.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Developed scripts and automated data management from end to end and sync up between all the clusters.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing

Environment: Hadoop, HDFS, Horton works (HDP 2.1), Map Reduce, Hive, Oozie, Sqoop, Pig, MySQL, Java, Rest API, Maven, MRUnit, Junit.

Confidential, Cleveland, Ohio

Java Developer/ Data Engineer

Responsibilities:

  • Designed, developed, maintained, tested, and troubleshoot Java and PL/SQL programs in support of Payroll employees.
  • Developed documentation for new and existing programs, designs specific enhancements to application.
  • Implemented web layer using JSF and Ice faces.
  • Implemented business layer using Spring MVC.
  • Implemented Getting Reports based on start date using HQL.
  • Implemented Session Management using Session Factory in Hibernate.
  • Developed the DO’s and DAO’s using hibernate.
  • Imported Data from Different Relational Data Sources like RDBMS, Teradata to HDFS using Sqoop.
  • Worked on writing transformer/mapping Map-Reduce pipelines using Apache Crunch and Java.
  • Imported Bulk Data into Cassandra file system Using Thrift API
  • Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for Map Reduce computations to handle custom business requirements
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Perform analytics on Time Series Data exists in Cassandra using Java API
  • Designed and implemented Incremental Imports into Hive tables.
  • Implement SOAP web service to validate zip code using Apache Axis.
  • Wrote PL/SQL program to send EMAIL to a group from backend.
  • Developer scripts to be triggered monthly to give current monthly analysis.
  • Scheduled Jobs to be triggered on a specific day and time.
  • Modified SQL statements to increase the overall performance as a part of basic performance tuning and exception handling.
  • Used Cursors, Arrays, Tables, Bulk collect concepts.
  • Extensively used log4j for logging the log files.
  • Performed UNIT testing in all the environments.
  • Used Subversion as the version control system

Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, Web Logic, REST, Rational Rose, Junit, Maven, JIRA, SVN.

Confidential

Java Developer

Responsibilities:

  • Involved in all the phases of the life cycle of the project from requirements gathering to quality assurance testing.
  • Developed Class diagrams, Sequence diagrams using Rational Rose.
  • Responsible in developing Rich Web Interface modules with Struts tags, JSP, JSTL, CSS, JavaScript, Ajax, GWT.
  • Developed presentation layer using Struts framework, and performed validations using Struts Validator plugin.
  • Created SQL script for the Oracle database
  • Implemented the Business logic using Java Spring Transaction Spring AOP.
  • Implemented persistence layer using Spring JDBC to store and update data in database.
  • Produced web service using WSDL/SOAP standard.
  • Implemented J2EE design patterns like Singleton Pattern with Factory Pattern.
  • Extensively involved in the creation of the Session Beans and MDB, using EJB 3.0.
  • Used Hibernate framework for Persistence layer.
  • Extensively involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
  • Deployed and built the application using Maven.
  • Performed testing using JUnit.
  • Used JIRA to track bugs.
  • Extensively used Log4j for logging throughout the application.
  • Produced a Web service using REST with Jersey implementation for providing customer information.
  • Used SVN for source code versioning and code repository.

Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, Web Logic, REST, Rational Rose, Junit, Maven, JIRA, SVN.

Hire Now