We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

New Brunswick New, JerseY

SUMMARY

  • Over 5 years of overall IT Industry and Software Development experience with 4+ years of experience in Hadoop Development
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experienced in major Hadoop ecosystem's projects such as Pig, Hive, HBase and monitoring them with Cloudera Manager.
  • Hands - on experience working on NoSQL databases including HBase, Cassandra and its integration with the Hadoop cluster.
  • Experience in implementing Spark, Scala application using higher-order functions for both batch and interactive analysis requirement.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Cloudera, and AWS.
  • Experienced in designing, built, and deploying a multitude application utilizing almost all the AWS stack (Including EC2, R53, S3, RDS, DynamoDB, SQS, IAM, and EMR), focusing on high-availability, fault tolerance, and auto-scaling.
  • Experienced in MVC (Model View Controller) architecture and various J2EE design patterns like singleton and factory design patterns.
  • Extensive experience in loading and analyzing large datasets with the Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala), NoSQL databases like MongoDB, HBase, Cassandra.
  • Solid understanding of HadoopMRV1 and HadoopMRV2 (or) YARN Architecture.
  • Hands-on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions
  • Hands-on experience in solving software design issues by applying design patterns including the Singleton Pattern, Business Delegator Pattern, Controller Pattern, MVC Pattern, Factory Pattern, Abstract Factory Pattern, DAO Pattern, and Template Pattern.
  • Good experience with design, coding, debug operations, reporting and data analysis utilizing python and using python libraries to speed up development.
  • Good Working experience in using different Spring modules like Spring Core Container Module, Spring Application Context Module, Spring MVC Framework module, Spring ORM Module in Web applications.
  • Used jQuery to select HTML elements, to manipulate HTML elements and to implement AJAX in Web applications. Used available plug-ins for extension of jQuery functionality.

PROFESSIONAL EXPERIENCE

HADOOP DEVELOPER

Confidential, New Brunswick, New Jersey

Responsibilities:

  • Worked directly with the Big Data Architecture Team which created the foundation of this Enterprise Analytics initiative in a Hadoop-based Data Lake.
  • Responsible forTechnical Lead & Business AnalystandHadoop developer.
  • Evaluate business requirements and prepare detailed specifications that follow project guidelines required todevelopwritten programs.
  • Responsible for buildingscalabledistributed data solutions usingHadoop.
  • Analyze large amounts ofdatasetsto determine optimal way to aggregate and report on it.
  • Develop simple to complexMapReduceJobs using Hive to cleanse and loaddownstreamdata’s
  • Handle importing of data from various data sources, perform transformations usingHive,MapReduce, load data intoHDFSand extract the data fromMySQLinto HDFS usingSqoop.
  • Export the analyzed data fromhivetables to SQL databases usingSqoopfor visualization and to generate reports for the BI team. • Extensively used Hive for data cleansing.
  • Create partitioned tables in Hive andManageand reviewHadoop log files.
  • Involved in creatingHive tables,loadingwith data and writing Hive queries, which will run internally in MapReduce way.
  • Use Hive to analyze thepartitionedand bucketed data and compute various metrics forreporting.
  • Used Unix bash scripts to validate the files from Unix toHDFS file systems.
  • Load and transform large sets ofstructured,semi structuredandunstructureddata and Manage data coming from different sources.
  • Parsedhigh-level designspecification to simple ETL coding and mapping standards.
  • Designed and customized data models for Data warehouse supporting data from multiple sources on real time
  • Involved in building theETL architectureand Source to Target mapping to load data intoData warehouse.
  • Created mapping documents to outlinedata flowfrom sources to targets.
  • Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
  • Extracted the data from the flat files and otherRDBMS databasesinto staging area and populated onto Data warehouse.
  • Maintained stored definitions,transformationrules and targets definitions using Informatica repository Manager.
  • Used varioustransformationslike Filter, Expression, Sequence Generator, Update Strategy, Joiner,Stored Procedure, and Union to develop robust mappings in the Informatica Designer.
  • Developed mapping parameters and variables to supportSQL override.
  • Created mapplets to use them in different mappings.
  • Developed mappings to load into staging tables and then toDimensions and Facts.
  • Used existing ETL standards to develop these mappings.
  • Worked on different tasks in Workflows like sessions,events raise, event wait, decision, e-mail,command, worklets, Assignment, Timerand scheduling of the workflow.
  • Created sessions, configured workflows to extract data from various sources, transformed data, and loading into data warehouse.
  • UsedType 1 SCD and Type 2 SCDmappings to update slowly ChangingDimensionTables.
  • Extensively usedSQL* loaderto load data from flat files to the database tables in Oracle.
  • Modified existing mappings for enhancements of new business requirements.
  • Used Debugger to test the mappings and fixed thebugs.
  • WroteUNIX shell Scripts& PMCMD commands for FTP of files from remote server and backup of repository and folder.
  • Involved in Performance tuning at source, target, mappings, sessions, and system levels.
  • Prepared migration document to move the mappings from development to testing and then to production repositories.

Environment: Spark, Scala, Eclipse, HBase, Spark SQL, Hive, Teradata, Hue, Spark Core, Linux, GitHub, AWS, JSON.

HADOOP DEVELOPER

Confidential, New York

Responsibilities:

  • Involved in complete project life cycle starting from design discussion to production deployment.
  • Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Developed a job server (REST API, Spring boot, ORACLE DB) and job shell for job submission, job profile storage, job data (HDFS) query/monitoring.
  • Implemented solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies such as Hive, Pig, Sqoop, Hbase, Map reduce, etc.
  • Design and develop a daily process to do incremental import of raw data from DB2 into Hive tables using Sqoop.
  • Involved in debugging Map Reduce job using MR Unit framework and optimizing Map Reduce.
  • Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into Hive tables.
  • Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest data into HDFS for analysis.
  • Used Oozie and Zookeeper for workflow scheduling and monitoring.
  • Effectively used Sqoop to transfer data from databases (SQL, Oracle) to HDFS, Hive.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
  • Designed Hive external tables using shared meta-store instead of derby with dynamic partitioning &buckets.
  • Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Design & implement ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created concurrent access for hive tables with shared/exclusive locks enabled by implementing Zookeeper in cluster.
  • Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
  • Implemented using SCALA and SQL for faster testing and processing of data. Real time streaming the data using with KAFKA.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically. worked on creating End-End data pipeline orchestration using Oozie.
  • Populated HDFS and Cassandra with massive amounts of data using Apache Kafka.
  • Involved in design and developed Kafka and Storm based data with the infrastructure team.
  • Worked on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Scala, Sqoop and Flume.
  • Developed Hive Scripts, Pig scripts, Unix Shell scripts, programming for all ETL loading processes and converting the files into parquet in the Hadoop File System.
  • Worked with Oozie and Zookeeper to manage job workflow and job coordination in the cluster.

Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Python, Spark, Spark-Streaming, Spark SQL, AWS EMR, AWS S3, AWS Redshift, Python, Scala, Spark, Map, Java, Oozie, Flume, HBase, Nagios, Ganglia, Hue.

We'd love your feedback!