We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Fremont, CA

SUMMARY:

  • Over 7 years of professional experience as software developer in design, development, deployment and support of large - scale distributed systems.
  • Experience in Big data technologies like Hadoop frame works, Map reduce, Hive, HBase, PIG, Sqoop, Spark, Kafka, Flume, Zookeeper and Oozie.
  • Excellent knowledge and understanding of Hadoop architecture and various components such as Big Data and Hadoop file system HDFS, Job Tracker, Task Tracker, Name node, Data node (Hadoop 1.x), Yarn concepts like Resource manager and Node manager (Hadoop 2.x).
  • Experience with distributed systems, large scale non-relational data stores, RDBMS, NoSQL, MapReduce systems, Data Modeling, Database performance, and multi-terabyte data warehouse.
  • Experience in loading data from Enterprise data lake (EDL) to HDFS using Sqoop scripts.
  • Experience in design, develop and Testing ETL Processes using Informatica, SSIS.
  • Experience in developing, support and Maintenance for the ETL (Extract, Transform and Load) process using Talend.
  • Experience in Data migration from relational (Oracle Exadata) databases or external data to HDFS.
  • Experience in deploying industries scale Data lake on cloud platforms.
  • Exposure in Designing and creating HDFS Data lakes by Drawing relationship between different sources of data from various systems.
  • Good knowledge in Talend Integration experience with AWS.
  • Experience with developing and securing applications against Data pipeline in large scale environments using Apache Flink.
  • Extensively used Scala and Java, Created frameworks for processing data pipelines through Spark.
  • Experience in developing Python code to gather the data from HBase and designs the solutions to implement using PySpark.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF).
  • Extensive experience in Designing, Installation, Configuration and Management of Apache Hadoop Clusters, Hadoop Eco systems and Spark with Scala as well Python.
  • Extensive experience in Data Ingestion, Transformation, Analytics using Apache Spark framework, and Hadoop ecosystem components.
  • Expert in working with Hive data warehouse tools - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
  • Experience using Kafka cluster for Data Integration and secured cloud service platform like AWS and doing data summarization, Querying and Analysis of large Datasets stored on HDFS and Amazon s3 filesystem using Hive Query Language (HiveQL).
  • Experience in Amazon AWS services such as Elastic Map Reduce (EMR) Storage S3, EC2 instances and Data Warehousing.
  • Experience in working with different Reporting tools like Tableau and Power BI.
  • Experience in streaming data using Apache Strom from source to Hadoop.
  • Experience with building stream -processing systems, using solutions such as Strom or Spark Streaming or HDF.
  • Experience in developing Oozie workflow scheduling and orchestrating with ETL Process.
  • Good Knowledge on spark architecture and real- time streaming using Spark.
  • Strong knowledge on implementing of data processing on Spark-Core using Spark SQL and Spark streaming.
  • Hands on experience in working on spark - SQL queries, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Expertise in integrating the data from multiple data sources using Kafka.
  • Experience of working with different Integrated Development Environments (IDE)’s, continuous Integration (CI)/ Continuous Development (CD) tools desired.
  • Experience working with Apache SOLR for indexing and querying.
  • Strong experience in Hadoop development and Testing big data solutions using cloudera Distribution, Horton works), Amazon Web services (AWS), Azure.
  • Experience in Active Development as well as onsite coordination activities in web based, client/server and distributed architecture using Java, J2EE which includes web services, Spring, Struts, Hibernate and JSP/Servlets.
  • Good working knowledge on servers like Tomcat, Web Logic 8.0.
  • During this period, I have also acquired strong knowledge of software Quality processes and SDLC (Software Development Life Cycle).
  • Extensively worked on Java development tools, which includes Eclipse Galileo 3.5, Eclipse Helios 3.6, Eclipse Mars 4.5, WSAD 5.1.2.

TECHNICAL EXPERTISE:

Big Data Ecosystem: Hadoop 0.22.0, MapReduce, HDFS, HBase, Zoo Keeper, Hive, PIG, Sqoop, Cassandra, Oozie, Azkaban

Java/J2EE: Java6, Ajax, Log4j, JSP 2.1 Servlets 2.3, JDBS 2.0, XML, Java Beans

Methodologies: Agile, UML, Design Patterns

Frameworks: Struts, Hibernate, Spring

Database: Oracle 10g, PL/SQL, MySQL

Application Server: Apache Tomcat 5.x 6.0, JBoss 4.0

Web Tools: HTML, CSS, Java Script, XML, XSL, XSLT, XPath, DOM

IDE/Testing Tools: NetBeans, Eclipse

Scripts: Bash, ANT, SQL, HiveQL, Shell Scripting

Testing API: JUNIT

Reporting Tools: Tableau, Power BI

PROFESSIONAL EXPERIENCE:

Confidential, Fremont, CA

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Hands-on experience with Talend - ETL (Extract-Transform-Load) tools.
  • Developing Talend jobs by using the context variables and scheduling the jobs to run it automatically.
  • Design and Develop Hadoop ETL solutions to move data to the data lake using big data tools like Sqoop, Hive, Spark, HDFS, Talend.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Create, manage, and modify logical and physical data models using a variety of data modeling philosophies and techniques.
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Supported current migration of Data Center to Amazon Cloud.
  • Extensively worked with S3 bucket in AWS.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Performed data manipulation, data shaping and data cleansing.

Environment: Hadoop YARN, Spark 1.6, Spark Streaming, Spark SQL, Scala, Python, Pyspark, Kafka, Hive, Sqoop, Elastic Search, Impala, Cassandra, Data modelling, Solr, Tableau Desktop, Power BI Server, Talend, Oozie, Jenkins, Cloudera, AWS-S3, Oracle 12c, Linux.

Confidential, Memphis, TN

Hadoop Developer

Responsibilities:

  • Experienced to implement Hortonworks distribution system
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Involved in running Hadoop jobs for processing millions of records of text data
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Worked on various Linux environments like Centos, Ubuntu and RedHat.
  • Experience in installing, upgrading and configuring Red Hat Linux Interactive Installation.
  • Involved in loading data from LINUX file system to HDFS
  • Responsible for managing data from multiple sources.
  • Experience working with ETL Informatica for data Integration.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Created ETL jobs to load data into MongoDB and transported MongoDB into the Data Warehouse.
  • Extracted data from MongoDB, Hbase, Sqoop and placed in HDFS for processing.
  • Assisted in exporting analyzed data to relational databases using Sqoop .
  • Comfortable coordinating with offshore team for development and support activities.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Writing custom user-defined functions (UDFs).
  • Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution.

Environment: Hortonworks, Hadoop, HDFS, Pig, Hive, Map Reduce, MongoDB, Sqoop, LINUX, Swift and Big Data

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Oversee the performance of Design to develop technical solutions from Analysis documents.
  • Exported data from DB2 to HDFS using Sqoop.
  • Extracted the data from Teradata into HDFS using Sqoop. Analyzed the data by performing Hive queries
  • Good Knowledge in developing MapReduce programs using Apache Crunch.
  • Implemented high performing ETL pipelines in Java, Hadoop.
  • Wrote MapReduce jobs using Pig Latin.
  • Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
  • Worked on Cluster coordination services through Zookeeper.
  • Worked on loading log data directly into HDFS using Flume.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Assisted in exporting analysed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
  • Experience in Develop monitoring and performance metrics for Hadoop clusters.
  • Experience in Document designs and procedures for building and managing Hadoop clusters.
  • Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
  • Successfully loaded files to Hive and HDFS from Mongo DB Solar.
  • Experience in Automate deployment, management and self-serve troubleshooting applications.
  • Define and evolve existing architecture to scale with growth data volume, users and usage.
  • Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
  • Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.

Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, MySQL and Ubuntu, Zookeeper, Java (JDK 1.6).

Confidential, Columbus, IN

Hadoop Developer

Responsibilities:

  • Worked on Installation and configuration Hadoop, MapReduce, HDFS, developed MapReduce jobs in Java for Data cleaning and preprocessing.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Responsible to manage data coming data from different sources.
  • Monitoring the running MapReduce programs on the cluster.
  • Responsible for loading data from UNIX file systems to HDFS.
  • Installed and configured Hive and written HIVE UDFs
  • Involved in creating Hive tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Implemented the workflow using Apache Oozie framework to automate asks.
  • Collaborating with all the other members of the team to take shared responsibility for the overall efforts that the team has committed to.
  • Developed Scripts and automate data management from end to end and sync up between all the clusters.
  • Experience in working with Hadoop testing team to deliver high quality software on time.
  • Allow end users to perform User acceptance Testing
  • Execute regression tests that compare the outputs of new application code with existing code running in production.

Environment: Apache Hadoop, Java (jdk1.6), DataStax, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, UNIX, Sqoop, Hive, Oozie

Confidential, Chevy Chase, MD

Java Developer

Responsibilities:

  • Designed, configured and developed the web application using JSP, Jasper Report, JavaScript, HTML.
  • Defined and designed the layers and modules of the project using OOAD methodologies and standard J2EE design pattern & guidelines.
  • Designed and developed all user interfaces using JSP, Servlets ad Spring frameworks.
  • Developed the DAO layer using Hibernate and used caching system for real time performance.
  • Developed web service provider methods (bottom up approach) using WSDL, XML and SOAP for transferring data between the applications.
  • Configured Java Messaging Services (JMS) on web sphere server using Eclipse IDE.
  • Used AJAX for developing asynchronous web application on client side.
  • Designed various applications using multi-threading concepts, mostly used perform time consuming tasks in the background. Wrote JSP & Servlets classes to generate dynamic HTML pages. Designed class sequence diagrams for modify and add modules.
  • Designed and developed XML Processing components for dynamic menus on the application.
  • Maintained the existing code base developed in spring and hibernate framework by incorporating new features and fixing bugs.
  • Developed stored procedures and triggers using PL/SQL in order to calculate and update tables to implement business logic using Oracle Database.
  • Used spring ORM module for integration with hibernate for persistence layer.

Environment: Java SE7, Java EE 6, JSP 2.1, Servlets 3.0, HTML, JDBC 4,0, IBM WebSphere 8.0, PL/SQL, XML, Spring 3.0, Hibernate 4.0, Oracle 12c, ANT, Java Script & jQuery, Junit, Windows 7 and Eclipse 3.7

Confidential

Java Developer

Responsibilities:

  • Worked on Agile development environment. Participated in scrum meetings.
  • Developed webpages using JSF framework establishing communication between various pages in application.
  • Developed high-level design documents, use case documents, detailed design documents and Unit Test Plan Documents and created Use cases, Class Diagrams and Sequence Diagrams using UML
  • Extensive involvement in database design, development, coding of stored procedures, DDL & DML statements, functions and triggers.
  • Used spring tool suite (STS) as the ide for the development.
  • Built a custom cross-platform architecture using Java, Spring Core/MVC, Hibernate through Eclipse IDE.
  • Involved in development of web Interfaces using MVC struts Framework.
  • User Interface was developed using JSP and tags, CSS, HTML and Java Script.
  • Used stored procedures to interact with Database.
  • Used session filter for implementing timeout for ideal users.

Environment: Linux, MySQL, MySQL Workbench, J2EE, Struts 1.0, Java Script, Swing, CSS, HTML, XML, XSLT, DTD, Eclipse, Junit, EJB 2.2, Tomcat, Web logic 7.0/8.1

Hire Now