We provide IT Staff Augmentation Services!

Hadoop Developer Resume

SUMMARY:

  • Professional Software Developer with 10+ years of experience in IT industry, which includes 4+ years of experience in Hadoop/Big Data technologies and 2 years of extensive experience in JAVA, Python, Database development and Data analytics.
  • Experience in using Cloudera and Hortonworks distributions.
  • Experience in analyzing data using Spark SQL, Hive QL and custom MapReduce programs in Java.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
  • Experience with different data source files like Avro, Parquet, RC, and ORC formats and compressions like snappy, bzip.
  • Experience with Oozie scheduler in setting up workflow jobs with actions that run Hive and Sqoop jobs.
  • Hands on experience with Relational databases like Teradata, Oracle and MySQL
  • Strong Experience in Unit Testing and System testing in Big Data.
  • Hands on experience with Spark using Scala and Python.
  • Hands on experience working with JSON files.
  • Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and
  • Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes.
  • Experience in developing the complex SQL queries, unions and multiple table joins and experience with views.
  • Involved in creating Hive tables, loading and analyzing data using hive queries.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Experience using Flume to collect, aggregate and store the weblog data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Migrated Flume with Spark for real time data and developed the spark Streaming Application with java to consume the data from Kafka and push them into Hive.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, efficient joins, Transformations and other during ingestion process itself.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
  • Experience on data extraction, Transformation and loading (ETL) data from various sources like Oracle, SQL Server and flat files using Informatica Power Center.
  • Experience in Object Oriented Analysis Design (OOAD) and development.
  • Hands on experience in application development using Java, RDMS, LINUX and UNIX shell scripting.
  • Hands on experience with version control software tools like SVN, Bit Bucket and Gitlab.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, Map Reduce, Yarn, HBase, Pig, Hive, Sqoop, Flume, Zookeeper, Spark,Storm, Hue, Impala, Kafka, Mahout, Oozie

Hadoop Distributions: Hortonworks Data platform 2.3.6, Cloudera 5.0

Web Technologies: HTML, XML, CSS

Databases: SQL Server, MySQL, MongoDB, Cassandra

Operating Systems: Unix, Linux, CentOS, Windows, MacOS

Languages: Java, SQL, Linux shell scripting, Python.

PROFESSIONAL EXPERIENCE:

Confidential, Irvine, CA

Hadoop Developer

Responsibilities:

  • Experience in implementing spark framework and UNIX scripting to implement the workflow for the jobs.
  • Involved in gathering business requirement, analyze the use case and implement the use case end to end.
  • Worked closely with the Architect; enhanced and optimized product Spark and python code to aggregate, group and run data mining tasks using Spark framework.
  • Experienced in loading the raw data into RDDs and validate the data.
  • Experienced in converting the validated RDDs into Data frames for further processing.
  • Implemented the Spark SQL code logic to join multiple data frames to generate application specific aggregated results.
  • Experienced in fine tuning the jobs for better performance in the production cluster space.
  • Worked totally in Agile methodologies, used Rally scrum tool to track the User stories and Team performance.
  • Worked extensively in Impala Hue to analyze the processed data and to generate the end reports.
  • Experienced working with hive database through beeline.
  • Worked on analyzing and resolving the production job failures in several scenarios.
  • Implemented UNIX scripts to define the use case workflow and also to process the data files, and automate the jobs.
  • Knowledge on implementing the JILs to automate the jobs in production cluster.

Environment: Spark, Python, Hive, Sqoop, Oozie, Unix Scripting, Spark SQL, Impala, Hue, Beeline, Autosys, Netezza.

Confidential Jacksonville, FL

Hadoop Developer

Responsibilities:

  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
  • Extensively worked on Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked with Hive to do transformations, joins and some pre - aggregations before storing the data into HDFS.
  • Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Teradata.
  • Built NIFI workflows for real-time data ingestion onto Hadoop and Teradata at same time.
  • Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Developed testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
  • Experience in working on the SAS code to convert the existing SAS datasets to the Hadoop environment.
  • Experience in Job management using Autosys scheduler and developed job processing scripts using Oozie workflow.

Environment: Cloudera, HDFS, Hive, Sqoop, python Flume, Java, Shell-script, LINUX, Impala, Eclipse, Sas, Tableau, MySQL.

Confidential

Hadoop Developer

Responsibilities:

  • Created Hive tables to store the processed results in a tabular format and written Hive scripts to transform and aggregate the disparate data.
  • Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows jobs in Oozie.
  • Worked on migrating the customers from Teradata to Hadoop and thus involved in Teradata decommission that in turn helped the organization by cost cutting.
  • Developed a utility to move the data from production to lower lanes using Distcp.
  • Experience in using Avro, Parquet, RC File and JSON file formats and developed UDFs using Hive and Pig.
  • E2E development of the ETL process by sourcing the data from upstream, perform complex transformations and export the data to Teradata.
  • Exported the aggregated data into RDBMS using Sqoop for creating dashboards in the Tableau and developed trend analysis using statistical features.
  • Scheduled snapshots of volumes for backup to find root cause analysis of failures and document bugs and fixes for downtimes and maintenance of cluster.
  • Utilized Agile Scrum Methodology to manage and organize the team with regular code review sessions.

Environment: Cloudera, HDFS, Hive, Sqoop, Shell-script, LINUX, Impala, Teradata.

Confidential

Teradata Developer

Responsibilities:

  • Responsible for gathering requirements from Business Analysts and Operational Analysts and identifying the data sources required for the requests.
  • Proficient in importing/exporting large amounts of data from files to Teradata and vice versa.
  • Developed the DW ETL scripts using BTEQ, Stored Procedures, Macros in Teradata.
  • Developed scripts for loading the data into the base tables in EDW using Fast Load, Multi Load and BTEQ utilities of Teradata
  • Created numerous scripts with Teradata utilities BTEQ, MLOAD and FLOAD.
  • Highly experienced in Performance Tuning and Optimization for increasing the efficiency of the scripts.
  • Developed reports using the Teradata advanced techniques like rank, row number
  • Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.
  • Tested database to check field size validation, check constraints, stored procedures and cross verifying the field size defined within the application with metadata.
  • Proficient in working on Set, Multiset, Derived, Volatile Temporary tables.
  • Designed and developed weekly, monthly reports related to the marketing and financial departments using Teradata SQL.
  • Extracted data from existing data source and performed ad - hoc queries.

Environment: Teradata V12, BTEQ, MLOAD, FLOAD, ORACLE, SQL, PLSQL, UNIX, Windows XP.

Confidential

Software Engineer

Responsibilities:

  • Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
  • Responsible for developing and modifying the existing service layer based on the business requirements.
  • Involved in designing & developing web - services using SOAP and WSDL.
  • Involved in database design.
  • Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
  • Created User Interface using JSF.
  • Involved in integration testing the Business Logic layer and Data Access layer.
  • Integrated JSF with JSP and used JSF Custom Tag Libraries to display the value of variables defined in configuration files.
  • Used technologies like JSP, JSTL, JavaScript, HTML, XML and Tiles for Presentation tier.
  • Involved in JUnit testing of the application using JUnit framework.
  • Written Stored Procedures functions and views to retrieve the data.
  • Used Maven builds to wrap around Ant build scripts.
  • CVS tool is used for version control of code and project documents.
  • Responsible to mentor/work with team members to make sure the standards and guidelines are followed and delivery of tasks in time.

Environment: JQuery, JSP, Servlets, JSF, JDBC, HTML, JUnit, JavaScript, XML, SQL, Maven, Web Services, UML, Web Logic Workshop and CVS.

Hire Now