We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Dallas, TX

PROFESSIONAL SUMMARY:

  • Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files.
  • Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache Sqoop.
  • Good exposure to performance tuning Hive queries, Pig Scripts and SQOOP.
  • Worked on Multi Clustered environment and setting up Cloudera Hadoop echo System and Hortonworks Hadoop echo system.
  • Skilled in data management, data extraction, manipulation, validation, and analyzing huge volume of data.
  • Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
  • Have good knowledge on Spark and MapReduce Jobs.
  • Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them. Extensively worked on HiveQL join operations, writing custom UDF's and having good experience in optimizing Hive Queries.
  • Having good knowledge of Oracle Database and excellent in writing the SQL queries.
  • Experience in SQL Server Import/Export wizard to migrate the heterogeneous databases such as Oracle and MS Accessdatabase, excel, flat files to SQL server.
  • Experience in transferring Streaming data from different data sources into HDFS and HBase using Apache Flume.
  • Well versed withTalendBigdata, Hadoop, Hive and usedTalendBigdata components like HDFS output, HDFS Input, Hive Load.
  • Utilized standardPythonmodules such as CSV and pickle for development.
  • Worked with data frames and MySQL, queried MYSQL database queries frompythonusingPython - MySQL connector and MYSQL DB package to retrieve information.
  • Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Experience in Creating ETL/Talendjobs both design and code to process data to target databases.
  • Worked with severalPythonlibraries like NumPy, Pandas and MatplotLib.
  • Experience in Creating Talendjobs to load data into various Oracle tables. Utilized Oracle stored procedures and write few Java code.
  • Hands on experience working on NoSQL databases like MongoDB, HBase, Cassandra and its integration with Hadoop cluster.
  • Experience working in Oracle, DB2, SQL Server and My SQL database.
  • Replication of tables to cross platform and Creating Materialized Views.
  • Good exposure in Software Development Life Cycle.
  • Supported team usingTalendas ETL tool to transform and load the data from different databases.
  • Excellent communication and inter-personal skills, flexible and adaptive to new environments, self-motivated, team player, positive thinker and enjoy working in multicultural environment.
  • Analytical, organized and enthusiastic to work in a fast paced and team oriented environment. Expertise in interacting with business users and understanding the requirement and providing solutions to match their requirement.
  • Proactive in time management and problem-solving skills, self-motivated and good analytical skills.

TECHNICAL SKILLS:

Programming Languages: Java, C, Python, Shell Scripting

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Hue, Impala, Sqoop, Apache Spark, Apache Kafka, Apache Ignite, Apache Nifi, OOZIE, FLUME, Zookeeper, YARN

No SQL Databases: MongoDB, HBase, Cassandra

Hadoop Distribution: Hortonworks, Cloudera, MapR

Databases: Oracle 10g, MySQL, MSSQL

IDE/Tools: Eclipse, NetBeans, Maven

Version control: GIT, SVN, CLEARCASE

Platforms: Windows, Unix, Linux

BI Tools: Tableau, MS Excel

Web/Server Application: Apache Tomcat, Web Logic, Web sphere, MSSQL Server, Oracle Server

Web Technologies: HTML, CSS, JavaScript, jQuery, JSP, Servlets, Ajax

PROFESSIONAL EXPERIENCE:

Confidential - Dallas, TX

Hadoop Developer

Responsibilities:

  • Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
  • Created ETL/Talendjobs both design and code to process data to target databases.
  • CreatedTalendjobs to load data into various Oracle tables. Utilized Oracle stored procedures and wrote few Java code to capture global map variables and use them in the job.
  • Reviewing and managing Hadoop log files by consolidating logs from multiple machines using Kafka.
  • Involved in data ingestion into HDFS using Sqoop for full load and Kafka for incremental load on variety of sources like web server, RDBMS and Data API's.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
  • Implemented custom input format and record reader to read XML input efficiently using SAX parser.
  • Collected the logs data from web servers and integrated them into HDFS using Kafka.
  • Created Hive tables and loaded the data in to tables for querying using HQL.
  • Developed Oozie workflow for scheduling Pig and Hive Scripts.
  • Created custom user defined functions inPythonlanguage for Pig.
  • DevelopedPythonMapper and Reducer scripts and implemented them usingHadoopstreaming.
  • Worked with Nifi for managing the flow of data from source to HDFS.
  • Involved in creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Involved in setting QA environment by implementing pig and Sqoop scripts.
  • Developed Pig Latin scripts to do operations of sorting, joining and filtering source data.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Designed, developed and improved complex ETL structures to extract transform and load data from multiple data sources into data warehouse and other databases based on business requirements.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Involved in converting Hive/Sql queries into Spark transformations using Spark RDD's.
  • Usedpythonsub-process module to call UNIX shell commands to check directories or files exists.
  • Worked on storing the dataframe into hive as table using PySpark.
  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Modified reports based on the feedback from QA testers and users in development and staging environments.
  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures
  • Exported data to Tableau reporting tool, created dashboards on live connection.
  • Manipulated and serialized, multiple datafile formats like JSON, XML.

Environment: Hadoop Hortonworks Distribution, MapReduce(Yarn), Python, Spark, Apache Nifi, Apache Ignite, ETL, Talend, HDFS, PIG, Hive, Kafka, Cassandra, Eclipse, Sqoop, Splunk, Linux shell scripting.

Confidential - Jessup, PA

HadoopDeveloper

Responsibilities:

  • Worked with systems engineering team to plan and deploy newHadoopenvironments and expand existingHadoopclusters with agile methodology.
  • Monitored multipleHadoopclusters environments and monitored workload, job performance and capacity planning using Cloudera Manager.
  • Designed and Developed Sqoop scripts to extract data from a relational database into Hadoop.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Performed MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Worked on PySpark SQL for faster performance with SQL scripts by defining the number of executors and defining executor memory to execute the pipeline.
  • Implementation of Sub-Process Module inpythonto call UNIX shell commands to verify the file existence.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by Business Intelligence tools.
  • Proactively monitored systems and services, architecture design and implementation ofHadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Participated in functional reviews, test specifications and documentation review.
  • Documented the systems processes and procedures for future references, responsible to manage data coming from different sources.

Environment: Cloudera Hadoop Distribution, HDFS, Talend, Map Reduce(JAVA), Python, Impala, Pig, Sqoop, Hive, Oozie, HBase, Shell Scripting.

Confidential

Hadoop Developer

Responsibilities:

  • Coordinated with business customers to gather business requirements. And interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Set up 10 node Hadoop clusters with IBM Big Insights.
  • Extracted the data from Oracle into HDFS using Sqoop to store and generate reports for visualization purpose.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile using Apache Flume and stored the data into HDFS for analysis.
  • Developed Hive scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
  • Extensive experience in writing Pig scripts to transform raw data into baseline data.
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
  • Worked on Oozie workflow engine for job scheduling.
  • Created Hive tables, partitions and loaded the data to analyze using HiveQl queries.
  • Leveraged Solr API to search user interaction data for relevant matches.
  • Designed the Solr Schema, and used the Solr client api for storing, indexing, querying the schema fields
  • Loading the data to HBASE by using bulk load and HBASE API.

Environment: Hortonworks Hadoop Distribution, MapReduce, Hive, Pig, HBase, Sqoop, Flume, Oozie, Solr, Shell script.

Confidential

Java/J2EEDeveloper

Responsibilities:

  • Analyzed project requirements for this product and involved in designing using UML infrastructure.
  • Interacting with the system analysts & business users for design & requirement clarification.
  • Extensive use of HTML5 with Angular JS, JSTL, JSP, jQuery and Bootstrap for the presentation layer along with JavaScript for client-side validation.
  • Taken care ofJavaMultithreading part in back end components.
  • Developed HTML reports for various modules as per the requirement.
  • Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
  • Created multiple RESTful web services using jersey 2 framework.
  • Used Aqua Logic BPM (Business Process Managements) for workflow management.
  • Developed the application using NOSQL on MongoDB for storing data to the server.
  • Developed complete business tier with state full sessionJavabeans and CMPJavaentity beans with EJB 2.0.
  • Developed integration services using SOA, Web Services, SOAP, and WSDL.
  • Designed, developed and maintained the data layer using the ORM framework in Hibernate.
  • Used Spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated Spring with JSF.
  • Involved in writing Unit test cases using JUnit and involved in integration testing.

Environment:Java, J2EE, HTML, CSS, JSP, JavaScript, Bootstrap, AngularJS, Servlets, JDBC, EJB,JavaBeans, Hibernate, Spring MVC, Restful, JMS, MQ Series, AJAX, WebSphere Application Server, SOAP, XML, MongoDB, JUnit, Rational Suite, CVS Repository.

Confidential

Oracle PL/SQLDeveloper

Responsibilities:

  • Developing Oracle PL/SQLstored procedures, Functions, Packages,SQLscripts.
  • Worked with users and applicationdevelopersto identify business needs and provide solutions.
  • Created Database Objects, such as Tables, Indexes, Views, and Constraints.
  • Enforced database integrity using primary keys and foreign keys.
  • Tuned pre-existing PL/SQLprograms for better performance.
  • Created many complexSQLqueries and used them in Oracle Reports to generate reports.
  • Implemented data validations using Database Triggers.
  • Used import export utilities such as UTL FILE for data transfer between tables and flat files
  • PerformedSQLtuning using Explain Plan.
  • Provided support in the implementation of the project.
  • Worked with built-in Oracle standard Packages like DBMS SQL, DBMS JOBS and DBMS OUTPUT.
  • Created and implement report modules into database from client system using Oracle Reports as per the business requirements.
  • Used PL/SQLDynamic procedures during Package creation.

Environment: Oracle 9i, Oracle Reports,SQL, PL/SQL,SQL*Plus,SQL*Loader, Windows XP.

We'd love your feedback!