We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Bellevue, WA

SUMMARY

  • 7 years of IT experience, including 3 years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, HBase, SPARK, SCALA and Big Data Analytics. 5 years of experience in Database Architecture, Core Java, JSP, Servlets, JavaScript, XML, JQuery, Python and Scala scripting.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Worked extensively on Database programming, Database Architecture, Hadoop.
  • Having 3 years of hands on experience working with HDFS, MapReduce framework and Hadoop ecosystem like Hive, HBase, Sqoop, and Oozie.
  • Good understanding of Hadoop Architecture and underlying Hadoop framework including Storage Management.
  • Hands on experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
  • Worked on backend using Scala and Spark to perform several aggregation logics.
  • Exposed in working with SPARK data frames and optimized the SLA’s.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Hands on experience in Micro strategy and Tableau to generate Hadoop data report.
  • Worked more into Production deployment on every month end Hadoop release items.
  • Involved in creating POCs to ingest and process streaming data using Spark and HDFS.
  • Expert in SQL Server RDBMS and have worked extensively on PL/SQL.
  • Expert in writing complicated SQL Queries and database analysis for good performance.
  • Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), Multithreading in Core Java, J2EE, Web Services (REST, SOAP), JDBC, Java Script and JQuery.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, and EMR.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Expert in Building, Deploying and Maintaining the Applications.
  • Experienced in preparing and executing Unit Test Plan and Unit Test Cases after software development.
  • Experience in Scrum, Agile and Waterfall models.
  • Worked on 24*7 environments to provide production support.
  • Co-ordinate with offshore team and cross-functional teams to ensure that application are properly tested, configured and deployed.
  • Hands on experience in different software development approaches such as Spiral, Waterfall & Agile iterative models.
  • Business Intelligence (DW) applications.
  • Generated ETL reports using Tableau and created statistics dashboards for Analytics.
  • Proficient in using data visualization tools like Tableau and MS Excel.
  • Excellent global exposure to various work cultures and client interaction with diverse teams
  • Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.

TECHNICAL SKILLS

Programming Languages: C, C++, JAVA, Scala.

Hadoop/ Big Data Stack: Hadoop, HDFS, YARN, MapReduce, Pig, Hive, Spark, Spark-SQL, Oozie, Zookeeper, HBase, Spark, Sqoop, Flume, Storm.

Hadoop Distributions: Cloudera, Horton Works, MapR.

Databases: Oracle, MySQL, DB2, Teradata, SQL Server, Sybase.

No SQL Databases: Base, Cassandra.

Web Technologies: Java, Servlets, EJB, JavaScript, CSS, Bootstrap.

Frameworks: MVC, Struts, Spring, And Hibernate.

IDE’s: Eclipse, NetBeans, IntelliJ.

Build&Integration Tools: Maven, SBT.

Operating Systems: Windows, Linux, Unix and CentOS.

Query Language: HiveQL, Spark SQL, Pig, SQL, PL/SQL.

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential, Bellevue, WA

Responsibilities:

  • Designing technical architecture and developed various Big Data workflows using custom Map Reduce, Pig, Hive and SQOOP.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Performed data ingestion from various data sources.
  • Performed data analysis on hive.
  • Used Pig for data transformations, joins, filters and aggregation.
  • Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
  • Scheduled jobs through Oozie and tracked progress.
  • Involved in moving legacy data from Sybase data warehouse to Hadoop Data Lake and migrating the data processing to lake.
  • Responsible for creating Data store, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
  • Created Java based Spark refiners to replace existing SQL Stored Procedures.
  • Created Hive refiners for simple UNIONS and JOINS.
  • Have experience in executing Hive Queries using Spark SQL that integrates Spark environment.
  • Used REST services in Java and Spring to expose data in the lake.
  • Automated the triggering of Data Lake REST API calls using Unix Shell Scripting and PERL.
  • Created reconciliation jobs for validating data between source and lake.
  • Used Scala to test Dataframe transformations and debugging issues with data.
  • Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate with other IDE features in Eclipse.
  • Used Avro format for staging data and ORC for final repository.
  • Used Sqoop import and export functionalities to handle large data set transfer between Sybase database and HDFS.
  • Experience in tuning Hive Queries and Pig scripts to improve performance.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
  • Used Eclipse and Ant to build the application.
  • Performed unit testing and integration testing using Junit framework.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Responsible for creating Data store, Datasets and Virtual Warehouse in the lake and then creating Spark and Hive refiners to implement the existing SQL Stored Procedures.
  • Involved in moving legacy data from Sybase ASE data warehouse to Hadoop Data Lake and migrating the data processing to lake.
  • Built re-usable Hive UDF libraries for business requirements which enabled various business analysts to use these UDF's in Hive querying.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Assigned the tasks of resolving defects found in testing the new application and existing applications.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
  • Implemented near real time data pipeline using framework based on Kafka, Spark and MemSQL.
  • Implemented EP Data Lake provides a platform to manage data in a central location so that anyone in the firm can rapidly query, analyze or refine the data in a standard way.

Environment: Hadoop, HDFS, Pig, Hive, Spark, Scala, Oozie, Sqoop, HBase, Sybase, Java, Kafka, UNIX, Maven, Junit, SVN, MapR.

Hadoop Developer

Confidential

Responsibilities:

  • Worked with Sqoop jobs with incremental load to populate HAWQ External tables to Internal table.
  • Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop.
  • Worked with Spark core, Spark Streaming and SQL modules of Spark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked with Pig, HBase, NoSQL database HBase and Sqoop, for analyzing the Hadoop cluster as well as big data.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Creating Hive tables and working on them for data analysis in order to meet the business requirements.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
  • Worked on data cleansing in order to populate into hive external table and internal tables.
  • Experience in using Sequence files, RCFile, AVRO and HAR file formats.
  • Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
  • Supporting and building the Data Science team projects on to Hadoop.
  • Used FLUME to dump the application server logs into HDFS.
  • Automating backups by shell for Linux to transfer data in S3 bucket.
  • Experience in working with NoSQL database HBASE in getting real time data analytics.
  • Hands on experience working as production support Engineer.
  • Worked on RCA documentation.
  • Automated incremental loads to load data into production cluster.
  • Ingested the data from various file system to HDFS using Unix command line utilities.
  • Hands on experience in moving data from one cluster to another cluster using DISTCP.
  • Experience in reviewing Hadoop log files to detect failures.
  • Worked on EPIC user stories and delivered on time.
  • Worked on data ingestion part for malicious intent model. Automated daily incremental jobs that can run on daily basis.
  • Hands on experience in Agile and scrum methodologies.

Environment: MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, HAWQ, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, HP ALM.

Hadoop Developer

Confidential

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and pre-processing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Used Multithreading, synchronization, caching and memory management.
  • Used JAVA application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC).
  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Built BIG data clusters using Apache Spark architecture for Analytics.
  • Developed PIG Latin scripts for the analysis of semi structured data. Developed and involved in the industry specific UDF (user defined functions)
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Supported Map Reduce Programs those are running on the cluster.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs.
  • Utilized Java and MySQL from day to day to debug and fix issues with client processes.

    Managed and reviewed log files.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Spark, MongoDB, Flume, Spark, HTML, XML, SQL, MySQL, Core Java, Eclipse, Shell scripting, UNIX.

Hadoop Developer

Confidential

Responsibilities:

  • Transferred purchase transaction details from legacy systems to HDFS.
  • Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Developed PIG UDF’S for manipulating the data as per the business requirements and worked on developing custom PIG Loaders.
  • Collected and aggregated large amounts of weblog data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer)
  • Experience in monitoring and managing Cassandra cluster.
  • Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
  • Installed and configured Flume, Hive, Pig, SqoopandOozie on the Hadoop cluster
  • Wrote the MapReduce jobs to parse the weblogs which are stored in HDFS
  • Developed the services to run the MapReduce jobs as per the requirement basis.
  • Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP
  • Extracted files from NoSQL database, Cassandra through Sqoop and placed in HDFS for processing.
  • Responsible to manage data coming from different sources
  • Analyzed the data using the Pig to extract number of unique patients per day and most purchased medicine
  • Implemented the workflows using Apache Oozie framework to automate tasks
  • Wrote UDF’S for Hive and Pig that helped spot market trends
  • Good knowledge in running Hadoop streaming jobs to process terabytes of xml format data
  • Analyzed the Functional Specifications

Environment: Hadoop, HDFS, pig, Hive, Tez, Accumulo, Flume, Sqoop, Oozie, Cassandra.

Java Developer

Confidential

Responsibilities:

  • Involved in preparation of functional definition documents and Involved in the discussions with business users, testing team to finalize the technical design documents.
  • Enhanced the Web Application using Struts.
  • Created business logic and application in Struts Framework using JSP, and Servlets.
  • Documented the code using Java doc style comments.
  • Wrote Client-side validation using Struts Validate framework and JavaScript.
  • Wrote unit test cases for different modules and resolved the test findings.
  • Implemented SOAP using Web services to communicate with other systems.
  • Wrote JSPs, Servlets and deployed them on WebLogic Application server.
  • Developed automated Build files using Maven.
  • Used Subversion for version control and log4j for logging errors.
  • Wrote Oracle PL/SQL Stored procedures, triggers.
  • Helped production support team to solve trouble reports
  • Involved in Release Management and Deployment Process.

Environment: Java, J2EE, Struts, JSP, Servlets, JavaScript, Hibernate, SOAP, WebLogic, Log4j, Maven, CVS, PL/SQL, Oracle, Windows

We'd love your feedback!