We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Southlake, TX

SUMMARY:

  • Over 8 Years of professional experience in designing, developing, integrating and testing software applications, which includes 5 year of experience in various Big Data technologies of Hadoop like Map - Reduce, Hive, Spark, Impala and Sqoop and 3+ years of experience in Java.
  • Hands on experience in programming and implementation of Java, Scala and Python codes with strong knowledge in Object Oriented Concepts.
  • Good experience in Data warehousing using different relational database management systems like Oracle, MySQL and Microsoft SQL server.
  • Proficient in using various IDEs like Eclipse, Intellij, Pycharm and DBvisualizer inaddition to experience in web development technologies like HTML, CSS and JavaScript.
  • Expertise in importing and exporting different formats of Data into HDFS, Hive and Impala from different RDBMS databases and vice-versa.
  • Hands on experience in working with User Defined Functions in Hive and Impala using Java and Python scripts.
  • Efficient in writing Map-Reduce programs using Apache Hadoop API for analyzing structured and unstructured data.
  • Skilled at extraction, transformation and analysis of Big Data using Impala, Spark and hive respectively.
  • Good at optimizing and debugging Hive-QL queries, Spark scripts and Map-Reduce programs.
  • Expert understanding of design patterns with strong analytical skills.
  • Experience in Big Data solutions for traditional enterprise businesses.
  • Proficient in gathering requirements, analysis, validation, business requirements specifications and functional specifications for schema creations and table creations.
  • Extensive experience in all phases of Software development life cycle (SDLC).
  • Hands on experience in tuning mappings with expertise in identifying and resolving performance bottlenecks in various levels.
  • Excellent skills in analyzing system architecture usage, defining and implementing procedures
  • A quick learner, punctual and trustworthy.
  • Motivated problem solver and resourceful team member with decent written and verbal communication skills.

TECHNICAL SKILLS:

Development Technologies: Java, Python, Scala (SBT and Maven) and Unix Shell Scripting.

IDEs: Eclipse, Intellij, Pycharm and DBvisualizer.

Hadoop Distributions: CDH 5.0.3 and AWS.

Operating Systems: Mac, ubuntu and Windows.

Databases: MySQL, Oracle, Hive, Impala and Redshift.

File Formats: Zgip, flat, Avro, Sequence, parquet and ORC.

Reporting: Microsoft-office(Power-point, Word & Excel), Github and tableau

PROFESSIONAL EXPERIENCE:

Senior Hadoop Developer

Confidential, Southlake, TX

Responsibilities:

  • Developed MapReduce programs to parse the raw data in XML and Json format and populate staging tables and store the refined data in partitioned tables in the EDW using Hive and Impala.
  • Created Hive queries that helped Data Scientists to spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Managed and review Hadoop log files and performing troubleshooting and tuning of the yarn parameters as required.
  • Worked on MapReduce jobs to transform Avro files to Parquet and text files to sequence files for performance gains.
  • Developed Scala programs using intellij SBT and Maven to read, parse and write both flat and compressed files present in Hive data warehouse using Spark.
  • Performance Optimization for Map-Reduce, Spark, Hive scripts.
  • Wrote Spark SQL and Spark jobs in Python to perform large scale aggregations and joins.
  • Worked on querying the avro and parquet files in S3 bucket using Hive-QL by configuring EC2 instances as computing nodes.
  • Worked on exporting data into AWS-S3 and configuring security in VPC to querying the data with Redshift for query performance monitoring.
  • Created lot of custom reports and Tableau Dashboards with interactive views, trends and drill downs.
  • Responsible for Developing Dashboard and preparing ad-Hoc Reporting and Proof-read all documents and maintain accuracy of information and preparing SOP.
  • Interacted with the existing database developers and DBA to understand the existing schema.
  • Interacting with business users by conducting meetings with the clients during the requirements analysis phase, and worked on presentation of the POC reports to end-client.

Environment: Big Data Platform - Hadoop 2.6.0(cdh5.5.1), Oracle BDA, Hadoop HDFS, Map Reduce, Hive, Sqoop, Spark, Impala, Java, Shell Scripts, Python, Scala, Eclipse, Tableau, Putty, FileZilla, Trifacta and Intellij.

Senior Hadoop Developer

Confidential, Mayfield Village, OH

Responsibilities:

  • Prepare technical design documents based on business requirements and prepare data flow diagrams.
  • Implement new design as per technical specifications.
  • Integrated Hadoop with Oracle in order to load and then cleanse raw unstructured data in Hadoop ecosystem to make it suitable for processing in Oracle using stored procedures and functions.
  • Experience in using Map-Reduce programming model for Batch processing of data stored in HDFS.
  • Developed Java Map-Reduce programs on log data to transform into structured way to find user location, login /logout time and spending time, errors.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Used SQOOP for importing data into HDFS and exporting data from HDFS to oracle database
  • Built re-usable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive querying
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive-QL queries.
  • Developed Spark scala Scripts for ETL kind of operation on captured data and delta record processing between newly arrived data and already existing data in HDFS.
  • Extensively used Pig for data cleansing.
  • Used Pyspark to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
  • Experienced in extending Hive and Pig core functionality by writing custom Impala UDFs using Java and Python.
  • Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Worked extensively in performance optimization by adopting/deriving at appropriate design patterns of the Map-Reduce jobs by analyzing the I/O latency, map time, combiner time, reduce time etc.
  • Troubleshooting: Used Hadoop logs to debug the scripts.

Environment: Big Data Platform - CDH 5.0.3, Hadoop HDFS, Map Reduce, Hive, Sqoop, Spark, Impala, Java, Shell Scripts, Oracle 10g, Eclipse, Tableau, Putty and Intellij.

Senior Hadoop Developer

Confidential, Houston, TX

Responsibilities:

  • Integrated, managed and optimized utility systems, including assets, devices, networks, servers, applications and data.
  • Ensured quality integration into the overall functions of smart meters into the system data acquisition and processing.
  • Enabled the use of metering data for a variety of applications such as billing, outage detection and recovery, fraud detection, finance, energy efficiency, customer care and a variety of analytics.
  • Analyzed large amounts of raw data in an effort to create information. Compiled technical specifications that allowed IT to create data systems, which supported the smart metering system.
  • Responsible for technical reviews and gave the quick-fix solution for the customer on production defects.
  • Developed Map-Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the enterprise data warehouse (EDW).
  • Worked with Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Written Map-Reduce java programs to analyze the log data for large-scale weather data sets.
  • Involved in testing Map-Reduce programs using MRUnit and JUnit testing frameworks.
  • Customize parser loader application of Data migration to HBase.
  • Provide support for data analysts in running ad-hoc Pig and Hive queries
  • Developed PL/SQL Procedures, Functions, and Packages using Oracle Utilities like PL/SQL, SQL Loader and Handled Exceptions to handle key business logic.
  • Utilized PL/SQL bulk collect feature to optimize the ETL performance. Fine-Tuned and optimized number of SQL queries and performed code debugging.
  • Developed UNIX & SQL script to load large volume of data for Data Mining & Data Warehousing.

Environment: Big Data Platform - CDH 4.2.1, Hadoop HDFS, Map Reduce, Hive, Sqoop, IBM DB2, PL/SQL, UNIX, Python, Eclipse.

Hadoop Developer

Confidential, NC

Responsibilities:

  • Involved in design and development of server side layer using XML, JDBC and JDK patterns using Eclipse IDE.
  • Involved in unit testing, system integration testing and enterprise user testing.
  • Extensively used Core Java, Servlets, and JDBC.
  • Developed data pipeline using Hive, Sqoop, Spark and Map Reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Worked with NoSQL databases like Hbase in creating tables to load large sets of semi structured data coming from various sources.
  • Wrote MRUnit test cases to test and debug Map Reduce programs in local machine.
  • Involved in creating Hive tables, loading data and running hive queries in those data.
  • Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
  • Developed scripts and Batch Jobs to schedule various Hadoop Program.
  • Written Hive queries for data analysis to meet the business requirements.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Developed Pig UDF’s to pre-process data for analysis.
  • Developed Complex and Multi-Step data pipeline using Spark.
  • Written Spark SQL queries for data analysis.

Environment: Big Data Platform - CDH 4.0.1, XML, Hadoop HDFS, Spark, Hive, Sqoop, Impala, Oracle 10g, Java, Eclipse.

Hadoop Developer

Confidential, NY

Responsibilities:

  • Involved in analysis, design and development of data collection, data ingestion, and data profiling and data aggregation.
  • Working in development of controller, Batch and logging module using JDK 1.6.
  • Worked on development of data ingestion process using FS Shell and data loading into HDFS.
  • Working in the definition of Hive query for different profiling rules like business checks, outlier’s checks and domain and data range validation.
  • Working on the automating the generation of Hive query and Map-Reduce programs.
  • Developed User Defined Function in java and python to facilitate data analysis in Hive and pig.
  • Managed the end-to-end delivery during the different phase of the software implementation.
  • Involved in initial POC implementation using Hadoop - Map Reduce, Spark Scripting, and Hive Scripting.
  • Designed the framework for Data Ingestion, Data Profiling and generating the Risk Aggregation report based various business entities.
  • Mapped the business requirements and rules with the Risk Aggregation System.
  • Used JDBC to invoke Stored Procedures and database connectivity to ORACLE.
  • Code debugging and creating Documentation for future use.

Environment: Big Data Platform - CDH 3, Map-Reduce, Hive, Spark Scripting, JDK 1.6, and Oracle.

Java Developer

Confidential

Responsibilities:

  • Involved in various stages of Projects from Architecture Designing, Business Analysis, Development, Testing and finally Production Stage.
  • Designed and developed different modules as part of project using Java/J2EE.
  • Contributed to an effective order processing system and simplified the existing order process, which proved to be more efficient.
  • Developed session beans as an enterprise business service object.
  • Used JDBC, Application server provided transaction API for accessing data from Oracle.
  • Involved in developing the unit test classes using J-Unit.
  • Used CVS for version control integrated with WSAD.
  • Used JavaScript for client side validations
  • Used Cascading Style Sheets in the application.
  • Involved in Development of User Interface using and JSPs.
  • Used Tomcat as the application server in the application.

Environment: Core Java, JDK 1.3, J-Unit, WSAD, Oracle, JavaScript, JDBC, EJB, CSS, HTML, Tomcat application server.

Java Developer

Confidential

Responsibilities:

  • Gather user requirements and followed by analysis and design.
  • Worked on the technical design to conform the framework.
  • Developed JSPs, action classes, form beans, response beans, EJBs.
  • Coded Servlets for the Transactional Model to handle many requests.
  • Developed business objects and business object helpers, which interact with middleware stubs.
  • Implemented business delegate pattern to separate view from business process.
  • Extensively used XML to code configuration files.
  • Developed PL/SQL stored procedures, triggers.
  • Developed complete Web tier of the application with Struts MVC framework.
  • Performed functional, integration, system, and validation testing.

Environment: JDK1.3, JSP, Apache Struts 1.0, Servlets, EJB 2.1, XML, JDBC, Eclipse, JBOSS, PL/SQL, Oracle 9i, Rational Rose, UNIX, MVC framework, JUnit, Rational Clear Case.

We'd love your feedback!