Hadoop Developer Resume

SUMMARY:

Professional Software Developer with 10+ years of experience in IT industry, which includes 4+ years of experience in Hadoop/Big Data technologies and 2 years of extensive experience in JAVA, Python, Database development and Data analytics.
Experience in using Cloudera and Hortonworks distributions.
Experience in analyzing data using Spark SQL, Hive QL and custom MapReduce programs in Java.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
Experience with different data source files like Avro, Parquet, RC, and ORC formats and compressions like snappy, bzip.
Experience with Oozie scheduler in setting up workflow jobs with actions that run Hive and Sqoop jobs.
Hands on experience with Relational databases like Teradata, Oracle and MySQL
Strong Experience in Unit Testing and System testing in Big Data.
Hands on experience with Spark using Scala and Python.
Hands on experience working with JSON files.
Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and
Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes.
Experience in developing the complex SQL queries, unions and multiple table joins and experience with views.
Involved in creating Hive tables, loading and analyzing data using hive queries.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Experience using Flume to collect, aggregate and store the weblog data from different sources like web servers, mobile and network devices and pushed to HDFS.
Migrated Flume with Spark for real time data and developed the spark Streaming Application with java to consume the data from Kafka and push them into Hive.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, efficient joins, Transformations and other during ingestion process itself.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
Experience on data extraction, Transformation and loading (ETL) data from various sources like Oracle, SQL Server and flat files using Informatica Power Center.
Experience in Object Oriented Analysis Design (OOAD) and development.
Hands on experience in application development using Java, RDMS, LINUX and UNIX shell scripting.
Hands on experience with version control software tools like SVN, Bit Bucket and Gitlab.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, Map Reduce, Yarn, HBase, Pig, Hive, Sqoop, Flume, Zookeeper, Spark,Storm, Hue, Impala, Kafka, Mahout, Oozie

Hadoop Distributions: Hortonworks Data platform 2.3.6, Cloudera 5.0

Web Technologies: HTML, XML, CSS

Databases: SQL Server, MySQL, MongoDB, Cassandra

Operating Systems: Unix, Linux, CentOS, Windows, MacOS

Languages: Java, SQL, Linux shell scripting, Python.

PROFESSIONAL EXPERIENCE:

Confidential, Irvine, CA

Hadoop Developer

Responsibilities:

Experience in implementing spark framework and UNIX scripting to implement the workflow for the jobs.
Involved in gathering business requirement, analyze the use case and implement the use case end to end.
Worked closely with the Architect; enhanced and optimized product Spark and python code to aggregate, group and run data mining tasks using Spark framework.
Experienced in loading the raw data into RDDs and validate the data.
Experienced in converting the validated RDDs into Data frames for further processing.
Implemented the Spark SQL code logic to join multiple data frames to generate application specific aggregated results.
Experienced in fine tuning the jobs for better performance in the production cluster space.
Worked totally in Agile methodologies, used Rally scrum tool to track the User stories and Team performance.
Worked extensively in Impala Hue to analyze the processed data and to generate the end reports.
Experienced working with hive database through beeline.
Worked on analyzing and resolving the production job failures in several scenarios.
Implemented UNIX scripts to define the use case workflow and also to process the data files, and automate the jobs.
Knowledge on implementing the JILs to automate the jobs in production cluster.

Environment: Spark, Python, Hive, Sqoop, Oozie, Unix Scripting, Spark SQL, Impala, Hue, Beeline, Autosys, Netezza.

Confidential Jacksonville, FL

Hadoop Developer

Responsibilities:

Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
Extensively worked on Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Worked with Hive to do transformations, joins and some pre - aggregations before storing the data into HDFS.
Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Teradata.
Built NIFI workflows for real-time data ingestion onto Hadoop and Teradata at same time.
Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop
Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
Developed testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
Experience in working on the SAS code to convert the existing SAS datasets to the Hadoop environment.
Experience in Job management using Autosys scheduler and developed job processing scripts using Oozie workflow.

Environment: Cloudera, HDFS, Hive, Sqoop, python Flume, Java, Shell-script, LINUX, Impala, Eclipse, Sas, Tableau, MySQL.

Confidential

Hadoop Developer

Responsibilities:

Created Hive tables to store the processed results in a tabular format and written Hive scripts to transform and aggregate the disparate data.
Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows jobs in Oozie.
Worked on migrating the customers from Teradata to Hadoop and thus involved in Teradata decommission that in turn helped the organization by cost cutting.
Developed a utility to move the data from production to lower lanes using Distcp.
Experience in using Avro, Parquet, RC File and JSON file formats and developed UDFs using Hive and Pig.
E2E development of the ETL process by sourcing the data from upstream, perform complex transformations and export the data to Teradata.
Exported the aggregated data into RDBMS using Sqoop for creating dashboards in the Tableau and developed trend analysis using statistical features.
Scheduled snapshots of volumes for backup to find root cause analysis of failures and document bugs and fixes for downtimes and maintenance of cluster.
Utilized Agile Scrum Methodology to manage and organize the team with regular code review sessions.

Environment: Cloudera, HDFS, Hive, Sqoop, Shell-script, LINUX, Impala, Teradata.

Confidential

Teradata Developer

Responsibilities:

Responsible for gathering requirements from Business Analysts and Operational Analysts and identifying the data sources required for the requests.
Proficient in importing/exporting large amounts of data from files to Teradata and vice versa.
Developed the DW ETL scripts using BTEQ, Stored Procedures, Macros in Teradata.
Developed scripts for loading the data into the base tables in EDW using Fast Load, Multi Load and BTEQ utilities of Teradata
Created numerous scripts with Teradata utilities BTEQ, MLOAD and FLOAD.
Highly experienced in Performance Tuning and Optimization for increasing the efficiency of the scripts.
Developed reports using the Teradata advanced techniques like rank, row number
Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.
Tested database to check field size validation, check constraints, stored procedures and cross verifying the field size defined within the application with metadata.
Proficient in working on Set, Multiset, Derived, Volatile Temporary tables.
Designed and developed weekly, monthly reports related to the marketing and financial departments using Teradata SQL.
Extracted data from existing data source and performed ad - hoc queries.

Environment: Teradata V12, BTEQ, MLOAD, FLOAD, ORACLE, SQL, PLSQL, UNIX, Windows XP.

Confidential

Software Engineer

Responsibilities:

Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
Responsible for developing and modifying the existing service layer based on the business requirements.
Involved in designing & developing web - services using SOAP and WSDL.
Involved in database design.
Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
Created User Interface using JSF.
Involved in integration testing the Business Logic layer and Data Access layer.
Integrated JSF with JSP and used JSF Custom Tag Libraries to display the value of variables defined in configuration files.
Used technologies like JSP, JSTL, JavaScript, HTML, XML and Tiles for Presentation tier.
Involved in JUnit testing of the application using JUnit framework.
Written Stored Procedures functions and views to retrieve the data.
Used Maven builds to wrap around Ant build scripts.
CVS tool is used for version control of code and project documents.
Responsible to mentor/work with team members to make sure the standards and guidelines are followed and delivery of tasks in time.

Environment: JQuery, JSP, Servlets, JSF, JDBC, HTML, JUnit, JavaScript, XML, SQL, Maven, Web Services, UML, Web Logic Workshop and CVS.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship