We provide IT Staff Augmentation Services!

Sr. Hadoop Engineer Resume

2.00/5 (Submit Your Rating)

Los Angeles, CA

SUMMARY

  • 8+ years of experience in IT including design and development of object oriented, web based enterprise applications, and big data processing applications.
  • Having 5+ years of experience working on Hadoop Ecosystem Components like Map - Reduce, Spark, SQOOP, Flume Pig, Hive, HBase, Oozie, Kafka, and Zookeeper.
  • Extensive experience with mapping, analysis, transformation and support of application software.
  • Good understanding on concepts of Hadoop Architecture and components like HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce.
  • Experience in usage of Hadoop distribution like Cloudera 5.3(CDH5, CDH3) and Horton works distribution.
  • Experience in doing analytics using Hive QL and spark SQL Data Frames.
  • Expertise in analyzing the data using HIVE and writing custom UDF's in PYTHON for extended HIVE functionality.
  • Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
  • Experience creating SQOOP jobs and automate them using shell scripts.
  • Experience in Creating shell scripts in UNIX system.
  • Experience in Data Transformation using Hive and Pig for different file formats.
  • Worked on Oozie workflow engine for job scheduling.
  • Experience in usage of User interface Ambari which will monitor the Hadoop cluster.
  • Experience in monitoring Hadoop Log files.
  • Involved in migrating Hive queries to Impala, and do analytics in impala.
  • Worked with admin teams to install Hadoop updates, patches, version upgrades as required.
  • Responsible for creating, modifying and deleting topics (Kafka Queues) as and when required by the Business team.
  • Experience in Usage of HBase.
  • Experience working on AWS S3.
  • Have flair to adapt to new technologies and products, self-starter, have excellent communication skills and good understanding of business work flow.

TECHNICAL SKILLS

Scripting Languages: Shell scripting, Python, Unix scripting

Big Data Technologies: HDFS, EMRFS, MapReduce, Hive, HQL, Pig, SQOOP, Flume, Spark, Zookeeper, Oozie, Kafka, HBase

Programming Languages: Python, SQL, Java, Spark

Databases: Oracle 10g/9i, MS SQL Server 2000/2005/2008 , MySQL,Teradata, Mongo DB

Data warehousing and Reporting tools: SSIS, SSRS

IDE's: NetBeans, Eclipse, IDLE

Virtual Machines: VMWare, Virtual Box

Operating Systems: Cent OS 5.5, Unix, Red Hat Linux, Windows7, Ubuntu

PROFESSIONAL EXPERIENCE

Confidential, Los Angeles, CA

Sr. Hadoop Engineer

Responsibilities:

  • Developed complete end to end Big data processing in Hadoop echo system.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Migrated complex Map reduces programs, Hive scripts into Spark RDD transformations and actions. Writing UDF/Mapreduce jobs depending on the specific requirement.
  • Created Java algorithms to find the mortgage risk factor and credit risk factors.
  • Created algorithms for all complex Map and reduce functionalities of all Mapreduce programs.
  • Extensively worked in code reviews and code remediation to meet the coding standards.
  • Written Sqoop scripts to import and export data in various RDBMS systems.
  • Worked in writing SPARK sql scripts for optimizing the query performance.

Environment: CDH 5.8.3, HDFS, SPARK, Pig, Hive, Beeline, Sqoop, Map Reduce, Oozie, AWS, Java 6/7, Git, Oracle 10g, YARN, UNIX Shell Scripting.

Confidential, Fort Lauderdale, FL

Big data/ Hadoop Developer

Responsibilities:

  • Developed data pipeline using FLUME, SQOOP, PIG AND JAVA MAPREDUCE to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Used PIG to do transformations, event joins, filter boot traffic and SOME Pre-Aggregations before storing the data onto HDFS.
  • Involved in developing PIG UDFs for the needed functionality that is not out of the box available from Apache Pig.
  • Developed Spark modules using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed Kafka Producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Analyzed the SQL scripts and designed the solution to implement using PYSPARK.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in developing HIVE scripts to create, alter and drop Hive tables.
  • Create scalable and high-performance web services for data tracking.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Experienced in managing Hadoop Cluster using Cloudera Manager.
  • Involved in using HCatalog to access Hive table metadata from Map Reduce or Pig code.
  • Involved in using Sqoop for importing and exporting data into HDFS to/from Oracle.

Environment: map reduce, yarn, hive, pig, HBase, Oozie, Sqoop, Splunk, flume, oracle 11g, core java, Cloudera, eclipse, python, Scala, spark, sql, Teradata, Unix shell scripting

Confidential, Charlotte, NC

Big data/ Hadoop Developer

Responsibilities:

  • Written MapReduce code for processing and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Worked with HBase and Hive scripts to extract, transform and load data into HBase and Hive.
  • Worked on moving all log files generated from various sources to HDFS for further processing.
  • Developed workflows using custom MapReduce, Pig, Hive, and Sqoop.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre -processing with PIG.
  • Tuned the cluster for optimal performance to process these large data sets.
  • Worked hands on with ETL process. Handled importing data from various data sources, Performed transformations in HIVE
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDFs in Hive Querying.
  • Loaded the created HFiles into HBase for faster access of large customer base without taking performance hit.
  • Written Hive UDF to sort Structure fields and return complex data type.
  • Responsible for loading data from UNIX file system to HDFS.
  • Involved in writing HQL queries and SQL queries.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Modeled Hive partitions extensively for data separation and faster data processing and followed.
  • Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in Mongo DB.
  • Work with network and Linux system engineers to define optimum network configurations, server hardware and operating system.
  • Evaluate and propose new tools and technologies to meet the needs of the organization.
  • Production support responsibilities include cluster maintenance.
  • Pig and Hive best practices for tuning.
  • Gained good experience with NOSQL database.

Environment: CDH, Hive, MySQL, HBase, HDFS, HIVE, Eclipse, Hadoop, Oracle, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Sqoop, UNIX.

Confidential, Germantown, Maryland

SQL Developer

Responsibilities:

  • Created complex Stored Procedures, Functions, Triggers, Tables, Indexes, Views, SQL joins and T-SQL Queries to test and implement business rules. Worked closely with DBA team to regularly monitor system for bottlenecks and implement appropriate solutions.
  • Created Non-clustered indexes to improve query performance and query optimization.
  • Maintained and managed database/stored procedures using SQL server tools like Performance Tuner and SQL Profiler.
  • Involved in package migration from DTS to SSIS, running upgrade advisor against DTS Packages before migration, troubleshooting issues and conversion into SSIS through wizard.
  • Extract data from Flat and Excel files and loaded to SQL Server database using Bulk Insert.
  • Created SQL Queries to extract and compare data in the different sources.
  • Created SSRS Reports from the SQL server tables, for different client's needs for data analysis. Involved in data Analysis, comparison and validation. Created ETL Packages to validate, extract, transform and load data to data warehouse and data marts.
  • Developed SSIS Packages for extracting the data from the file system, transformed and loaded the data into OLAP. Used SSIS to populate data from various data sources, creating packages for different data loading operations for applications
  • Created reports using Global Variables, Expressions and Functions using MS SQL Server Reporting Services. Designed and deliver dynamic reporting solutions using MS SQL Server Reporting Services.
  • Applied conditional formatting in SSRS to highlight key areas in the report data.
  • Heavy integration of ASW400 IBD DB2 with SQL Server and Star Quest replication tool.
  • Used Report Builder to do Ad-hoc Reporting. Develop various types of complex reports like Drill Down, Drill Through, Gauges, Pie Chart, Bar Chart, and Sub Reports. Used Reporting Services to schedule various reports to be generated on predetermined time.
  • Created Stored Procedures for commonly used complex queries involving join and union of multiple tables. Created Views to enforce security and data customization.

Environment: SQL Server (SSIS, SSRS, SSAS) 2008R2/2008/2005, DB2, HP Quality center 10.0, Visual Studio 2005, XML, XSLT, MS Office and Visual source safe.

Confidential

Software Developer

Responsibilities:

  • Document the entire project, which contains detail description of all the functionalities.
  • Actively involved in gathering and analyzing the user requirements in coordination with Business.
  • Worked as a developer in creating complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views and other T-SQL code and SQL joins for applications.
  • Extensively used T-SQL in constructing user defined functions, views, indexes, user profiles, relational database models and data integrity.
  • Created Stored Procedures to validate the data coming with different data discrepancies using data conversions.
  • Generated SSRS Enterprise reports from SQL Server Database (OLTP) and SQL Server and included various reporting features such as groups, sub-groups, adjacent-groups, group by total, group by sub-totals, grand-totals, drilldowns, drill through, sub-reports etc.
  • Created different Parameterized Reports like standard parameter reports, cascading parameter reports which consist of report Criteria in various reports to make minimize the report execution time and to limit the no of records required.
  • Worked on all types of report types like tables, matrix, charts, sub reports etc.
  • Created Linked reports, Ad-hoc reports etc. based on the requirement. Linked reports are created in the Report Server to reduce the repetition the reports.
  • Designed models using Framework Manager and deployed packages to the Report Net Servers.
  • Implemented security to restrict the access of users and to allow them to use only certain reports.

Environment: SQL, SQL Server 2008, SQL Server Management Studio, SSRS, SSIS, T-SQL, Microsoft Excel, Notepad++.

We'd love your feedback!