We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Austin, TX

SUMMARY

  • Technically accomplished professional wif 8+ years of total experience in Software Development and Requirement Analysis in Agile work environment and 5 years of Big Data Eco Systems experience in ingestion, storage, querying, processing and analysis of Big Data.
  • In - depth noledge and hands-on experience in dealing wif ApacheHadoopcomponents like HDFS, MapReduce, HiveQL, HBase, Pig, Hive, Sqoop, Oozie, Cassandra, Flume, and Spark.
  • Very good understanding/noledge ofHadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts
  • Extensively worked on MRV1 and MRV2 Hadoop architectures.
  • Hands on experience in writing MapReduce programs, Pig & Hive scripts.
  • Designing and creating Hive external tables using shared meta-store instead of derby wif partitioning, dynamic partitioning and buckets.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Knowledge on RabbitMQ.
  • Knowledge on Hortonworks Distribution.
  • Loaded streaming log data from various webservers into HDFS using Flume.
  • Experience in building Pig scripts to extract, transform and load data onto HDFS for processing.
  • Excellent noledge of data mapping, extract, transform and load from different data source.
  • Experience in writing HiveQL queries to store processed data into Hive tables for analysis.
  • Excellent understanding and noledge of NOSQL databases like HBase and Cassandra.
  • Expertise in database design, creation and management of schemas, writing Stored Procedures, Functions, DDL, DML, SQL queries & Modeling.
  • 4 Years of IT experience in ETL Architecture, Development, enhancement, maintenance, Production support, Data Modeling, Data profiling, Reporting including Business requirement, system requirement gathering.
  • 2 years of hands-on experience in shell scripting.
  • Knowledge on cloud services Amazon web services (AWS) and Azure.
  • Knowledge on Elastic MapReduce(EMR)
  • Proficient in using RDMS concepts wif Oracle, SQL Server and MySQL.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good noledge of J2EE design patterns and Core Java design patterns.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Worked extensively wif CDH3, CDH4.
  • Skilled in leadership, self-motivated and ability to work in a team effectively
  • Possess excellent communication and analytical skills along wif a can - do attitude.
  • Strong work ethics wif desire to succeed and make significant contributions to teh organization

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Hbase, Oozie, Zookeeper, Apache Kafka, Cassandra, StreamSets, Impyla, Solr

Programming Languages: Java JDK1.4/1.5/1.6/1.8 (JDK 5/JDK 6), C, HTML, SQL, PL/SQL, Python, Scala

Client Technologies: JQUERY, Java Script, AJAX, CSS, HTML 5, XHTML, D3, Angular JS

Operating Systems: UNIX, WINDOWS, LINUX

Application Servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere

Web technologies: JSP, Servlets, JNDI, JDBC, Java Beans, JavaScript, Web Services (JAX-WS)

Databases: Oracle 8i/9i/10g & MySQL 4.x/5.x

Java IDE: Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential, Austin, TX

Responsibilities:

  • Developed Pyspark code to read data from Hive, group teh fields and generate XML files
  • Enhanced teh Pyspark code to write teh generated XML files to a directory to zip them to CDAs
  • Implemented REST call to submit teh generated CDAs to vendor website
  • Implemented Impyla to support JDBC/ODBC connections for Hiveserver2
  • Enhanced teh Pyspark code to replace spark wif Impyla.
  • Performed installation for Impyla on teh Edge node
  • Evaluated performance of Spark application by testing on cluster deployment mode vs local mode
  • Experimented submissions wif Test OIDs to teh vendor website
  • Explored StreamSet Data collector
  • Implemented StreamSets data collector tool for ingestion into Hadoop.
  • Created a StreamSet pipeline to parse teh file in XML format and convert to a format that is fed to Solr
  • Built a data validation dashboard in Solr to be able to display teh message record.
  • Wrote shell script to run Sqoop job for bulk data ingestion from Oracle into Hive
  • Created tables for teh ingested data in Hive
  • Scheduled Oozie job for data ingestion for teh Sqoop job
  • Worked wif JSON file format for StreamSets
  • Implemented POC wif Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.
  • Transferred teh data using Informatica tool from AWS S3 to AWS Redshift
  • Leverage teh AWSplatform for Big Data Analytics - MapReduce Algorithms, Ad-hoc Hive querying, and Sqoop for pulling in structured data from SQL Server to Hive Warehouse.

Environment: Sqoop, StreamSets, Impyla, Pyspark, Solr, Oozie, Hive, Impala, Informatica, AWS

Hadoop Developer

Confidential, Westlake, TX

Responsibilities:

  • Evaluated Spark’s performance vs Impala on transactional data.
  • Used Spark transformations and aggregations using Python and Scala to perform min, max and average on transactional data.
  • Experienced in migrating data from HiveQL to SparkSQL using Scala.
  • Knowledge in using Spark Data-frames to load data in Spark Data-frames.
  • Knowledge on handling Hive queries using Spark SQL that integrate wif Spark environment.
  • Used java to develop Restful API for database Utility Project.
  • Responsible for performing extensive data validation using Hive.
  • Designed a data model in Cassandra(POC) for storing server performance data.
  • Implemented a Data service as a rest API project to retrieve server utilization data from dis Cassandra Table.
  • Implemented Python script to call teh Cassandra Rest API, performed transformations and loaded teh data into Hive.
  • Designed data model to ingest transactional data wif and wifout URIs into Cassandra.
  • Implemented shell script to call python script to perform min, max and average on utilization data of 1000s hosts and compared teh performance on various levels of summarization.
  • Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off teh jobs on time for data availability.
  • Generated reports from dis hive table for visualization purpose.
  • Migrated HiveQL to SparkSQL to validate Spark’s performance wif Hive’s.
  • Implemented Proof of concept for Dynamo DB, Redshift and EMR
  • Proactively researched on Microsoft Azure.
  • Presented Demo on Microsoft Azure, an overview of cloud computing wif Azure.

Environment: Hadoop, Azure, AWS, HDFS, Hive, Hue, Oozie, Java, Linux, Cassandra, Python, Open TSDB, Scala

Hadoop Developer

Confidential, Chicago, IL

Responsibilities:

  • Worked on analyzing, writingHadoopMapReduce jobsusingJavaAPI,Pig and Hive.
  • Responsible for building scalable distributed data solutions usingHadoop.
  • Involved in loading data from edge node toHDFS using shell scripting.
  • CreatedHBasetables to store variable data formats of PII data coming from different portfolios.
  • Exported teh analysed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.
  • Analyze large and critical datasets using Cloudera, HDFS, Hbase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, & Spark.
  • Developed custom aggregate functions usingSparkSQL and performed interactive querying.
  • Used Pig to store teh data into HBase.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Used Pig to parse teh data and Store in Avro format.
  • Stored teh data in tabular formats using Hive tables and Hive SerDes.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked wif NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data e
  • Implemented a script to transmit information from Oracle toHbaseusingSqoop.
  • Worked on tuning teh performancePig queries.
  • Involved in writing teh shell scripts for exporting log files toHadoopcluster through automated process.
  • Exported teh analyzed data to teh relational databases usingSqoopfor visualization and to generate reportsfor teh BI team.
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • InstalledOozieworkflow engine to run multipleHiveandpigjobs.
  • Analyzed large amounts of data sets to determine optimal way toaggregateandreporton it.

Environment:Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Hadoop Developer

Confidential, IL

Responsibilities:

  • Worked on writing transformer/mapping Map-Reduce pipelines using Java.
  • Handling structured and unstructured data and applying ETL processes.
  • Collected teh logs data from web servers and integrated in to HDFS using Flume.
  • Involved in creatingHive Tables, loading wif data and writing Hive queries which will invoke and run Map Reduce jobs in teh backend.
  • Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
  • Designed and implemented Incremental Imports into Hive tables.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Extensively used Pig for data cleansing.
  • Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse teh logs and structure them in tabular format to facilitate effective querying on teh log data.
  • Worked extensively wif Sqoop for importing and exporting teh data from HDFS to Relational Database system and vice-versa. Loading data into HDFS.
  • Involved in creating Hive tables, loading wif data and writing hive queries that will run internally in map reduce way.
  • Facilitated teh Production move ups of ETL components from Acceptance to Production environment
  • Experienced in managing and reviewing theHadooplog files.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing teh data onto HDFS.
  • Implemented teh workflows using Apache Oozie framework to automate tasks.
  • Worked wif Avro Data Serialization system to work wif JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Developed scripts and automated data management from end to end and sync up between all teh clusters.
  • Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.

Environment: Hadoop, Big Data, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, Hbase, Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera., PL/SQL, Windows, UNIX Shell Scripting, and Eclipse

Hadoop Developer

Confidential, Atlanta, GA

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written multiple Map Reduce programs in Java for Data Analysis.
  • Wrote Map Reduce job using Pig Latin and Java API.
  • Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
  • Developed pig scripts for analyzing large data sets in teh HDFS.
  • Collected teh logs from teh physical machines and teh OpenStack controller and integrated into HDFS using Flume.
  • Designed and presented plan for POC on impala.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Knowledge on handling Hive queries using Spark SQL that integrate wif Spark environment.
  • Implemented Avro and parquet data formats for apache Hive computationsto handle custom business requirements.
  • Responsible for creating Hive tables, loading teh structured data resulted from Map Reduce jobs into teh tables and writing hive queries to further analyze teh logs to identify issues and behavioral patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Implemented Daily jobs that automate parallel tasks of loading teh data into HDFS using autosys and Oozie coordinator jobs.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Responsible for performing extensive data validation using Hive.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare wif historical data.
  • Used Kafka to load data in to HDFS and move data into NoSQL databases(Cassandra)
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in submitting and tracking Map Reduce jobs using Job Tracker.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off teh jobs on time for data availability.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Responsible for cleansing teh data from source systems usingAb Initio componentssuch as Join, Dedup Sorted, De normalize, Normalize, Reformat, Filter-by-Expression, Rollup.
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Teradata, Zookeeper, autosys, Hbase, Cassandra, Apache ignite

Hadoop Developer

Confidential, New York, NY

Responsibilities:

  • Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
  • Worked on analysing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
  • Developed data pipeline usingFlume, Sqoopto ingest customer behavioral data and purchase histories intoHDFSfor analysis.
  • Continuous monitoring and managing theHadoopclusterusingClouderaManager.
  • UsedPigto perform data validation on teh data ingested using scoop and flume and teh cleansed data set is pushed intoHbase.
  • Participated in development/implementation ofClouderaHadoopenvironment.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked wif Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating teh cluster and scheduling workflows.
  • Designed and built teh Reporting Application, which uses theSparkSQL to fetch and generate reports on HBase table data.
  • Extracted teh needed data from teh server into HDFS and Bulk Loaded teh cleaned data into HBase.
  • Responsible for creatingHivetables, loading teh structured data resulted from MapReduce jobs into teh tables and writinghivequeries to further analyze teh logs to identify issues and behavioral patterns.
  • Involved in running MapReduce jobs for processing millions of records.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • DevelopedHivequeries andPigscripts to analyze large datasets.
  • Involved in importing and exporting teh data from RDBMS toHDFSand vice versa using Sqoop.
  • Involved in generating teh Adhoc reports usingPigandHivequeries.
  • UsedHiveto analyze data ingested intoHbaseby usingHive-Hbaseintegration and compute various metrics for reporting on teh dashboard.
  • Provide operational support for Hadoop and/or MySQL databases
  • Developed job flows inOozieto automate teh workflow for pig and hive jobs.
  • Loaded teh aggregated data ontoOraclefromHadoopenvironment usingSqoopfor reporting on teh dashboard.

Environment:RedHat Linux, HDFS, Map-Reduce, Hive,JavaJDK1.6, Pig, Sqoop, Flume, Zookeeper, Oozie, Oracle, HBase.

Java Developer

Confidential

Responsibilities:

  • Involving in Analysis, Design, Implementation and Bug Fixing Activities.
  • Involving in Functional & Technical Specification documents review.
  • Created and configured domains in production, development and testing environments using configuration wizard.
  • Involved in creating and configuring teh clusters in production environment and deploying teh applications on clusters.
  • Deployed and tested teh application using Tomcat web server.
  • Analysis of teh specifications provided by teh clients.
  • Involved to Design of teh Application.
  • Ability to understand Functional Requirements and Design Documents.
  • Developed Use Case Diagrams, Class Diagrams, Sequence Diagram, Data Flow Diagram
  • Coordinated wif other functional consultants.
  • Web related development wifJSP, AJAX, HTML, XML, XSLT, andCSS.
  • Create and enhance teh stored procedures,PL/SQL, SQLfor Oracle 9i RDBMS.
  • Designed and implemented a generic parser framework using SAX parser to parse XML documents which stores SQL.
  • Deployed teh application onWebLogic Application Server 9.0.
  • Extensively usedUNIX /FTPfor shell Scripting and pulling teh Logs from teh Server.
  • Provided further Maintenance and support, dis involves working wif teh Client and solving their problems which include major Bug fixing.

Environment:Java 1.4, Web logic Server 9.0, Oracle 10g, Web services Monitoring, Web Drive, UNIX/LINUX, Web Logic Server, JavaScript, HTML, CSS, XML

Java Developer

Confidential

Responsibilities:

  • Involved in design and development phases of Software Development Life Cycle (SDLC).
  • Involved in designing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
  • Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
  • Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify teh complexities of teh application.
  • Developed a Dojo based front end including forms and controls and programmed event handling.
  • Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
  • Used Core java and object oriented concepts.
  • Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
  • Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
  • Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making teh system more scalable.
  • Deployed application on windows using IBM Web Sphere Application Server.
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
  • Used Web Services - WSDL and REST for getting credit card information from third party.
  • Used ANT scripts to build teh application and deployed on Web Sphere Application Server.

Environment: Core Java, J2EE, Oracle, SQL Server, JSP, JDK, JavaScript, HTML, CSS, Web Services, Windows.

We'd love your feedback!