We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Plymouth, MN

SUMMARY:

  • 6 years of professional experience, that includes development, deployment, maintenance and support of various projects in big organizations.
  • Strong experience with Big Data and Hadoop technologies with excellent knowledge of Hadoop ecosystem: Hive, Spark, Kafka, Sqoop, Pig, HBase, Oozie, and Talend.
  • Deep knowledge of Hadoop architecture (HDFS, YARN, MapReduce) along with their insight internal operations.
  • Worked with Big Data Hadoop distributions like MapR, Cloudera, and Hortonworks.
  • Experience in AWS cloud environment.
  • Hands on experience on VPC, EC2, EMR, S3, Redshift, Cloudwatch, SNS .
  • Experienced on Spark and performed various transformations and actions on large datasets using RDDs.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, and Spark YARN.
  • Experience in capturing data and importing it to HDFS using Kafka for semi-structured data and Sqoop for existing relational databases
  • Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Analyzed large data sets using Hive queries and Pig Scripts.
  • Expertise in understanding Partitions, Bucketing concepts in Hive.
  • Experienced in job workflow scheduling and monitoring tools like Oozie
  • Worked on Talend Open Studio Data and Big Data integration and Preparation tools. Designed and performed ETL jobs using Talend Open Studio.
  • Imported and exported data using Sqoop from HDFS to RDBMS.
  • Exposure to file formats like Sequence, ORC, Parquet and JSON.
  • Worked on NoSQL databases including Hbase.
  • Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
  • Well versed in designing and implementing MapReduce jobs using JAVA on Eclipse to solve real world scaling problems.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
  • Basic Knowledge of UNIX and shell scripting.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, Spark, Scala, Kafka, Mapreduce, HBase, Pig, Hive, Sqoop, Oozie, Talend.

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, HTML, CSS.

IDE s: Eclipse, SVN, Apache ANT, Log4J, Maven, JUnit, WinSCP.

NOSQL:: HBase.

DB Languages: SQL.

Application Server: Tomcat

Programming languages: C, Java, shell scripting.

Operating Systems: LINUX, Windows XP, 7, MS DOS.

PROFESSIONAL EXPERIENCE:

Confidential, Plymouth, MN

Hadoop Developer

Responsibilities:

  • Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
  • Implemented Kafka for streaming data and filtered, processed the data.
  • Developed data pipeline using Kafka , Sqoop , Hive to ingest transactional data into HDFS for analysis.
  • Developed Ingestion framework to read mainframe files and create hive snapshot tables on EDP.
  • Created Hive tables based on business requirements. Wrote many Hive queries, UDFs and implemented concepts like Partitioning, Bucketing for efficient data access.
  • Created Hive tables in Parquet and ORC file formats using Snappy and Gzip compression tools.
  • Developed Spark code by using Scala/Spark-SQL for faster processing. Responsible for ingestion of data into EDP.
  • Developed workflows using Oozie to automate the tasks.
  • Involved in QA, test data creation, and unit testing activities.
  • Involved in design, development and testing phases of Software Development Life Cycle.
  • Utilized Agile Scrum Methodology to help manage and organize a team with regular code review sessions.

Environment : Hadoop, spark, scala, kafka, Yarn, Hive, Oozie, Sqoop, Hortonworks.

Confidential, Eden Prairie, MN

Hadoop Developer

Responsibilities:

  • Worked on analyzing, writing Hadoop Mapreduce jobs using Java API, Pig and Hive .
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting . .
  • Implemented Partitioning, Dynamic Partitioning, Buckets in Hive .
  • Developed PIG scripts using Pig Latin.
  • Handled importing data from web logs, MySQL and various data sources using sqoop .
  • Designing & Creating ETL Jobs through Talend to load huge volumes of data into Hbase , Hadoop Ecosystem and relational databases.
  • Developed testing automation framework using Talend for record count check, duplicate check, field level validation and scd2 validation.
  • Developed Spark code and Spark - SQL to extract data from Datalake to our Tenant to replicate Talend functionality.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Implemented Spark using Scala and utilizing Dataframes and Spark SQL API for faster processing of data.
  • Written shell scripts for automation of job.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files. .
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop .

Environment : Apache Hadoop, Apache Spark, Scala, spark-sql, MapReduce, HDFS, Hive, Java, Pig, Hbase, Teradata, Talend, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, PL/SQL, SQL connector, MapR.

Confidential, Basking Ridge, NJ

Hadoop Developer

Responsibilities:

  • Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the Datalake.
  • Involved in configuring batch job to perform ingestion of the source files in to the Data lake
  • Created several jobs in Talend ETL tool to perform transformation on source files .
  • Used Pig to do the transformation of the data that were in the HDFS to fit the requirements.
  • Created several Pig UDFs for the enrichment engine those were used to perform enrichment on the data.
  • Developed Hive queries to load data to HBase .
  • Leveraged Hive queries to create ORC tables.
  • Created ORC tables to improve the performance for the reporting purposes.
  • Worked extensively on Hive to create, alter and drop tables and involved in writing hive queries.
  • Created and altered Hbase tables on top of data residing in Datalake.
  • Designed and Developed Reference table engine frameworks on Talend using Hadoop tools such as HDFS , Hive , Hbase Mapreduce .
  • Experience on Talend components like transformation, file processing, java components, Unix, DB related and logging framework.
  • Worked closely with System Analyst and Architects to design and develop Talend jobs to fit the business requirement.
  • Experience in scheduling jobs in Talend .
  • Worked on agile methodology using Rally .

Environment : Hadoop, Map Reduce, Yarn, Hive, Pig, Hbase, Sqoop, MapR, Talend, Core Java, Eclipse, Linux

Confidential, Burlington, MA

Hadoop Developer

Responsibilities:

  • Involved in migrating data from slough to AWS using ETL.
  • Responsible for creating Hive tables based on business requirements
  • Developed Simple to complex MapReduce Jobs using Hive and Pig
  • Worked on AWS cloud environment.
  • Hands on experience on VPC , EC2, S3, EMR, Redshift, Data Pipeline , cloudwatch , sns .
  • Demonstrate analytical and problem solving skills, particularly those that apply to a " Big Data " environment
  • Developed scripts and improved the performance of the project by automating data management from end to end and embedded monitoring logic using cloudwatch and sns .
  • Worked on EMR to convert the raw data to derived format and also to transfer data from one server to another.
  • Worked on Sql workbench to load and aggregate the data from S3 to Redshift .
  • Importing and exporting data into HDFS and Hive using Flume .
  • Worked on Tableau dashboard on testing the performance of the dashboard by calculating the response time.
  • Expert knowledge developing and debugging in Java/J2EE .
  • Worked hands on with ETL process using Python and Java .
  • Migrated all the on premise data from Salo , Oracle , MySQL to Amazon redshift using python , Attunity tool on Amazon EC2 instance.
  • Developed data pipelines to process the data from the source systems directly into Redshift database.
  • Wrote MapReduce jobs and integrated it with Oozie workflow for batch processing on huge datasets.
  • Implemented Partitioning, Dynamic Partitioning and Bucketing in HIVE .
  • Exported the result set from HIVE to MySQL using Sqoop after processing the data.
  • Utilized Agile Scrum Methodology to help manage and organize a team with regular code review sessions and daily stand ups.

Environment: Hadoop, HDFS, Hue, MapReduce, Hive,Pig,Sqoop,AWS,VPC,EC2,S3,EMR,Redshift,Data pipeline, cloudwatch, sns,Splunk, SQL Server, MySQL, Hbase, MongoDB, UNIX Shell Scripting.

Confidential, Roseville, CA

Hadoop Developer

Responsibilities:

  • Responsible for building data solutions in Hadoop using Cascading frameworks.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Worked hands on with ETL process.
  • Upgrading the Hadoop Cluster from CDH3 to CDH4. Integrate the HIVE with existing applications.
  • Configured Ethernet bonding for all Nodes to double the network bandwidth.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Teradata into HDFS using Sqoop.
  • Used Python and Shell scripts to automate the end-to-end ELT process
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Teradata, Cloudera Manager, Pig, Sqoop, Oozie, Python.

Confidential, Dallas, TX

Java/J2EE Developer

Responsibilities:

  • Involved in designing and developing modules at both Client and Server Side.
  • Developed the UI using JSP, JavaScript and HTML.
  • Responsible for validating the data at the client side using JavaScript.
  • Interacted with external services to get the user information using SOAP web service calls
  • Developed web components using JSP, Servlets and JDBC.
  • Designed the controller using Servlets.
  • Accessed backend database Oracle using JDBC.
  • Developed and wrote UNIX Shell scripts to automate various tasks.
  • Developed user and technical documentation.

Environment: Java, Servlets, JSP, JavaScript, JDBC, Unix Shell scripting, HTML, Eclipse, Oracle 8i, WebLogic.

We'd love your feedback!