We provide IT Staff Augmentation Services!

Big Data/hadoop Developer Resume

3.00/5 (Submit Your Rating)

Charlotte, NC

PROFESSIONAL SUMMARY:

  • 6+ years of IT experience in software development, big data management, data modeling, data integration, implementation and testing of enterprise class systems spanning big data frameworks, advanced analytics and Java/J2EE technologies.
  • 3+ years of hands on experience in Hadoop components & Map Reduce programming for parsing and populating tables for Terabytes of data.
  • Extensive usage of Sqoop, Flume, Oozie for data ingestion into HDFS & Hive warehouse.
  • Experienced on major Hadoop ecosystem’s projects such as Pig, Hive and HBase.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
  • Hands on performance improvement techniques for data processing in Hive, Impala, Spark, Pig & map-reduce using methods including but not limited to dynamic partitioning, bucketing, file compression.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and kafka.
  • Expertise of ingesting data to Solr from HBase
  • Extensively worked on debugging using Eclipse debugger.
  • Experienced in importing data from various sources using StreamSets.
  • Experience with Cloudera, Hortonworks & MapR Hadoop distributions.
  • Strong work ethic with desire to succeed and make significant contributions to the organization.
  • Strong problem-solving skills, effective communication, interpersonal skills and a good team player.
  • Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
  • A pleasing personality with the ability to build great rapport with clients and customers.
  • Illustrates excellent verbal and written communication, paired with great presentation and interpersonal skills.
  • Portrays strong leadership qualities, backed with a great track record as a team player.
  • Adept with the latest business/technological trends.

Spark & Transaction Processing

  • Hands on experience with Spark-SQL for various business use-cases.
  • Used Spark-SQL, Scala APIs for querying & transformation of data residing in Hive.
  • Used python for Spark SQL jobs to fast process the data.
  • Replaced existing MR jobs with Spark streaming & Spark data transformations for efficient data processing.

Core Competencies

  • Hadoop Development & Troubleshooting
  • Data Analysis
  • Data Visualization & Reporting in Tableau
  • Real-time Streaming using Spark.
  • Map Reduce Programming
  • Performance Tuning of Hive & Impala
  • Ingesting data from HBase to Solr
  • Data import using StreamSets.

TECHNICAL SKILLS:

Hadoop Ecosystems: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Flume, Spark and Zookeeper, Solr, StreamSets.

Apache Spark: Spark, Spark SQL, Spark Streaming, Scala.

ETL Tools: Informatica with Hadoop connector, Pentaho, Alteryx

Scripting Languages: Java, C, Scala, SQL, Unix Shell Scripting, Python

Java Technologies: JQuery, JSP, Servlets.

SQL Databases: Oracle, SQL Server 2012, SQL Server 2008 R2, DB2, Teradata

No-SQL: MongoDB, HBase.

Development tools: Maven, Eclipse, IntelliJ, PyCharm

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, NC

Big Data/Hadoop Developer

Responsibilities:

  • Developed and Supported Map Reduce Programs those are running on the cluster.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Created Hive tables and working on them using Hive QL.
  • Involved in installing Hadoop Ecosystem components.
  • Validated Namenode, Data node status in a HDFS cluster.
  • Handled 2 TB of data volume and implemented the same in Production.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Used to manage and review the Hadoop log files.
  • Responsible to manage data coming from various sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Wrote Map Reduce job using Java API.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Installed and configured Pig and written PigLatin scripts.
  • Developed UDFs for Pig Data Analysis
  • Involved in managing and reviewing Hadoop log files.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Used JUnit for unit testing and Continuum for integration testing.
  • Worked hands on with ETL process.
  • Upgrading the Hadoop Cluster from CDH3 to CDH4 and setup High availability Cluster Integrate the HIVE with existing applications
  • Configured Ethernet bonding for all Nodes to double the network bandwidth
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Imported Source/Target tables from the respective SAP R3 and BW systems and created reusable transformations (Joiner, Routers, Lookups, Rank, Filter, Expression and Aggregator) inside a Mapplets and created new mappings using Designer module of Informatica Power Center to implement the business logic and to load the customer healthcare data incrementally and full.
  • Created Complex mappings using Unconnected Lookup, and Aggregate and Router transformations for populating target table in efficient manner.
  • Optimized the mappings using various optimization techniques and debugged some existing mappings using the Debugger to test and fix the mappings.
  • Update maps, sessions and workflows as a part of ETL change.
  • Modifications to existing ETL Code and document the changes.

Environment: Java Hadoop, MapReduce, HDFS, Hive, Pig, Linux, XML, Eclipse, Cloudera CDH3/4 Distribution, Informatica 9.1

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

  • Created external hive tables to move the data from different source to cloudera.
  • Used to keep track of the data once it gets loaded and updated on Weekly and daily basis
  • Performed SQL Joins among Hive tables to make it into one table.
  • Ingested data from Hive to HBase and HBase to Solr using Spark.
  • Worked on Ingesting the data using StreamSets from various sources like JDBC to Hive by Sqoop jobs.
  • Data import from Hive to Solr using StreamSets.
  • Near-Real time indexing into Solr for automated process after scheduling the job.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
  • Involved in developing ETL data pipelines for real-time streaming data using Kafka and Spark.
  • Worked on POC to pull over the third-party data and used Spark SQL to create schema RDD and loaded it into Hive Tables and structured data using Spark SQL.
  • Importing and exporting data into HDFS, Pig, Hive and HBase using Sqoop.
  • Managing and reviewing Hadoop log files.
  • Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop system.
  • Responsible to manage data coming from different data sources.
  • Developed Flume ETL job for handling data from HTTP source and sink as HDFS.
  • Worked closely with Admin to setup with Kerberos Authentication.
  • Write test cases to test software throughout development cycles, inclusive of functional/unit-testing/continuous integration.
  • Manage operations, monitoring, and troubleshooting for all HADOOP development and production issues.
  • Develop/Debug performance testing/tuning in the existing application.
  • Detailed design specification document and implementing business rules.
  • Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop system.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Gained experience in managing and reviewing Hadoop log files.
  • Interacted closely with Web Developer for usage of application and to pull data from Solr as well as HBase and populate it in front end.
  • Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
  • Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.
  • Implemented Spark SQL queries using Python for fast processing the data.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way.
  • Exported analyzed data to relational databases using Sqoop for visualization to generate reports for the BI team.

Environment: Hive, HDFS, HBase, Solr, StreamSets, Spark, Kafka,Sqoop, Scala, IntelliJ, Python, PyCharm.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

  • Developed data pipeline using Sqoop, Flume to store data into HDFS and further processing through spark.
  • Creating Hive tables with periodic backups, writing complex Hive/Impala queries to run on Impala.
  • Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques with optimizations.
  • Wrote Python script to convert Autosys jobs, HDFS directory location paths from old standards to new standards.
  • Wrote Python scripts for getting yarn job list for performance metrics.
  • Created Hive Generic UDF's to process business logic that varies based on policy.
  • Experience in customizing map reduce framework at different levels like input formats, data types, custom serde and partitioners.
  • Pushed the data to Windows mount location for Tableau to import it for reporting.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Implemented Spark to migrate map reduce jobs into Spark RDD transformations, streaming data using Spark streaming
  • Worked on joins to create Hive look up tables.
  • Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
  • Analyzed large data sets by running Hive queries scripts.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Developed HIVE scripts passing dynamic parameters using hivevar.
  • Created partitioned tables in Hive for best performance and faster querying.
  • Configured build scripts for multi module projects with Maven.
  • Automated the process of scheduling workflow using Oozie and Autosys.
  • Prepared Unit test cases and performed unit testing.
  • Created external table and partitioned tables in hive for querying purpose.

Environment: Hadoop, Cloudera, HDFS, Hive, Spark, Sqoop, Flume, Java, Scala, Shell-script, Impala, Eclipse, Tableau, MySql.

Confidential

Java Developer

Responsibilities:

  • Involved in requirement Analysis, Designing, Coding and Testing.
  • Developed application on Agile scrum basis.
  • Developed and implemented the MVC Architectural pattern using Struts Framework including JSP, Servlets, EJB and Action classes.
  • Object Oriented Analysis and Design using UML include development of class diagrams, Sequence diagrams and State diagrams and implemented these diagrams in Microsoft Visio.
  • Involved in writing client-side validations using JavaScript, CSS.
  • Designed and developed the UI using Struts view components HTML, CSS and JavaScript.
  • Developed JMS API using J2EE package.
  • Used Oracle as Database and used Toad for queries execution and involved in writing SQL scripts, PL/SQL code for procedures and functions.
  • Involved in designing test plans, test cases and overall Unit testing of the system.
  • Prepared documentation and participated in preparing user's manual for the application.

Environment: Java, JQuery, Junit, Servlets, Spring 2.0, Web Logic, Eclipse, JSP, Windows XP, HTML, CSS, JavaScript, and XML.

We'd love your feedback!