We provide IT Staff Augmentation Services!

Big Data/hadoop Developer Resume

2.00/5 (Submit Your Rating)

Sanjose, CA

SUMMARY:

  • Spark/Hadoop developer with 8 + Years of professional IT experience including 3+ Years of Big data consultant involved in analysis, design and development using Hadoop ecosystem components and performing Data Ingestion, Data modeling, Querying, Processing, Storage Analysis, Data Integration and Implementing endeavor level systems transforming Big data.
  • Proficient in Oracle Packages, Procedures, Functions, Trigger, Views, SQL Loader, Performance Tuning, UNIX Shell Scripting, Data Architecture.
  • Astounding hands on experience in Data Extraction, Transformation, Loading and Data Analysis and Data Visualization utilizing Cloudera Platform (Spark, Scala, HDFS, Hive, Sqoop, Kafka, Oozie).
  • Developed end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities as per the necessities.
  • Worked on Data Modelling using various ML (Machine Learning Algorithms) via R and Python (Graphlab) Worked on Programming Languages like Core Java and Scala.
  • Knowledgeable with developing and implementing Spark programs in Scala using Hadoop to work with Structured and Semi - structured data.
  • Utilized Spark for intuitive queries, processing of streaming data and integration with NoSQL database for bulk volume of data.
  • Extract data from heterogeneous sources like flat files, MySQL, Teradata into HDFS using Sqoop and the other way around .
  • Broad experience working with structured data using Spark SQL, Data frames, Hive QL, optimizing queries, and in corporate complex UDF's in business logic.
  • Experience working with Text, Sequence files, XML, Parquet, JSON, ORC, AVRO file formats and Click Stream log files.
  • Experienced in migrating ETL transformations using Spark jobs and Pig Latin Scripts.
  • Experience in transferring Streaming data from different data sources into HDFS and HBase using Apache Kafka and Flume.
  • Experience in using Oozie schedulers and Unix Scripting to implement Cron jobs that execute different kind of Hadoop actions.
  • Good experience in optimization/performance tuning of Spark Jobs, PIG & Hive Queries.
  • Familiarly comfortable with data architecture including data ingestion pipeline design, Hadoop architecture, data modeling and data mining and advanced data processing. Experience optimizing ETL workflows.
  • Excellent understanding of Spark Architecture and framework, Spark Context, APIs, RDDs, Spark SQL, Data frames, Streaming, MLlib.
  • Adequate understanding of Hadoop Gen1/Gen2 architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Secondary Name Node, Data Node and YARN architecture and its deamons Node manager, Resource manager and App Master and Map Reduce Programming Paradigm.
  • Hands on experience in using the Hue browser for interacting with Hadoop components.
  • Good understanding and Experience with Agile and Waterfall methodologies of Software Development Life Cycle (SDLC).
  • Highly motivated, self-learner with a positive attitude, willingness to learn new concepts and accepts challenges.

TECHNICAL SKILLS:

Big Data Technologies: Spark and Scala, Hadoop Ecosystem Components - HDFS, Hive, Sqoop, Impala, Flume, Map Reduce, Pig and Cloudera Hadoop Distribution CDH 5.8.2

Cloud Platforms: Amazon Web Services (AWS), Microsoft Azure and Confidential Cloud

Monitoring Tools::

Cloudera Manager

Programming Languages: Scala, Java, SQL, PL/SQL, Python.

Scripting Languages: Shell Scripting, CSH.

NoSQL Databases: HBase

Databases: Oracle 11g, MySQL, MS SQL Server

Schedulers: Oozie

Operating Systems: Windows 7/8/10, Unix, Linux

Other Tools: Hue, IntelliJ IDEA, Eclipse, Maven, Zoo Keeper

Front End Technologies: HTML5, XHTML, XML, CSS

PROFESSIONAL EXPERIENCE:

Confidential, Sanjose, CA

Big Data/Hadoop Developer

Responsibilities:

  • Developed Map Reduce programs for data extraction, transformation and aggregation. Supported Map Reduce Jobs those are running on the cluster.
  • Developed Pig Scripts for replacing the existing home loans legacy process to Hadoop and data is back fed to retail legacy mainframe systems.
  • Worked on RDBMS oracle SQL for data CURD operations.
  • Enac ted solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies such as Hive, Spark, Pig, Sqoop, HBase, Map reduce, etc.
  • Worked on creating Combiners, Partitioners and Distributed cache to improve the performance of Map Reduce jobs.
  • Developed Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Optimization of Map reduce algorithms using combiners and partitioners to deliver the best results and worked on Application performance optimization for a HDFS cluster.
  • Wrote Hive Queries to have a consolidated view of the telematics data.
  • Orchestrated many Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub workflows.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Developed HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in debugging Map Reduce jobs using MRUnit framework and optimizing Map Reduce jobs.
  • Entangled in troubleshooting errors in Shell, Hive and Map Reduce.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Implemented solutions for ingesting data from various sources and processing the Data Utilizing Big Data Technologies such as Hive, Pig, Sqoop, Hbase, and Map reduce, etc.
  • Design and implement map reduce jobs to support distributed processing using Map Reduce, Hive and Apache Pig.
  • Created Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Hbase, DB2, Flume, Oozie, CDH 5.6.1, Maven, Unix Shell Scripting.

Confidential, Woonsocket, RI

Spark/ Hadoop Developer

Responsibilities:

  • Involved in scripting Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities as to the requirements .
  • Stock the data into Spark RDD and Perform in-memory data computation to generate the output exact to the requirements.
  • Developed data pipelines using Spark, Hive and Sqoop to ingest, transform and analyze operational data.
  • Developed Spark jobs, Hive jobs to encapsulate and transform data.
  • Worked with the RDBMS to make OLTP transformations using SQL .
  • Fine- Tune Spark application to improve performance.
  • Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
  • Real time streaming the data using Spark with Kafka. Responsible for handling Streaming data from web server console logs.
  • Refine Performance tuning of long running Greenplum user defined functions. Leveraged the feature of temporary tables break the code into small sub part load to a temp table and join it later with the corresponding join tables. Table distribution keys are refined based on the data granularity and primary key column combination.
  • Toiled on numerous file formats like Text, Sequence files, Avro, Parquet, ORC, JSON, XML files and Flat files using Map Reduce Programs.
  • Ex pand ed daily process to do incremental import of data from DB2 and Teradata into Hive tables using Sqoop.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Res olved performance issues in Hive and Pig scripts with analyzing Joins, Group and Aggregation and how it translate to MR jobs.
  • Work with cross functional consulting teams within the data science and analytics team to design, develop and execute solutions to det ermine business insights and solve clients operational and strategic problems.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HBase tables.
  • Expan sively worked with Partitions, Dynamic Partitioning, Bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
  • Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Assisted analytics team by writing Pig and Hive scripts to perform further detailed analysis of the data.
  • Designing Oozie workflows for job scheduling and batch processing.

Environment: Java 1.8, Scala 2.10.5, Apache Spark 1.6.0, MySQL, CDH 5.8.2, IntelliJ IDEA, Hive, HDFS, YARN, Map Reduce, Sqoop 1.4.3, Flume, Unix Shell Scripting, Python 2.6, Apache Kafka.

Confidential, Bothel, WA

Hadoop Administrator

Responsibilities:

  • Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 150 nodes ranges from POC (Proof-of-Concept) to PROD clusters.
  • Acquaintance with Installation, configuration, deployment, maintenance, monitoring and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production using Ambari front-end tool and Scripts.
  • Created databases in MySQL for Hive, Ranger, Oozie, Dr. Elephant and Ambari.
  • Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.
  • Toiled on Hortonworks Distribution which is a major contributor to Apache Hadoop.
  • Installed Apache Nifi to make data ingestion fast, easy and secure from internet of anything with Hortonworks data flow.
  • Complete end to end design and development of Apache Nifi flow which acts as the agent between middleware team and EBI team and executes all the actions mentioned above.
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
  • Replacement of Retired Hadoop slave nodes through AWS console and Nagios Repositories.
  • Installed and configured Ambari metrics, Grafana, Knox, Kafka brokers on Admin Nodes.
  • Interacted with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
  • Experience with implementing High Availability for HDFS, Yarn, Hive and HBase.
  • Commissioning and Decommissioning Nodes from time to time.
  • Component unit testing using Azure Emulator.
  • Implemented NameNode automatic failover using zkp controller.
  • As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
  • Introduced Smart Sense and got optimal recommendations from the vendor and even for troubleshooting the issues.
  • Good experience with Hadoop Ecosystem components such as Hive, HBase, Pig and Sqoop.
  • Configured the Kerberos and installed MIT ticketing system.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Installing and configuring CDAP, an ETL tool in the development and Production clusters.
  • Integrated CDAP with Ambari to for easy operations monitoring and management.
  • Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
  • Monitor Hadoop cluster and proactively optimize and tune cluster for performance.
  • Experienced in defining job flows. Ranger security enabled on all the Clusters.
  • Experienced in managing and reviewing Hadoop log files
  • Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
  • Installed Grafana for metrics analytics & visualization suite.
  • Installed various services like Hive, HBase, Pig, Oozie, and Kafka.
  • Monitoring local file system disk space usage, CPU using Ambari.
  • Production support responsibilities include cluster maintenance.
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Engage d in the requirements review meetings and interacted intuitively with business analysts to clarify any specific scenario.

Environment: HDP, Ambari, HDFS, MapReduce, Yarn, Hive, NiFi, Flume, PIG, Zookeeper, TEZ, Oozie, MYSQL, Puppet, and RHEL

Confidential

Java / SQL Developer

Responsibilities:

  • Installing and configuring CDAP, an ETL tool in the development and Production clusters.
  • Involved in developing the UI pages using HTML, DHTML, CSS, JavaScript, JSON, jQuery, Ajax.
  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modelling, analysis, design and development.
  • Performed Design, involved in code reviews and wrote unit tests in Python.
  • Designed the database schema for the content management system Performed Design and Code reviews.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Generated Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.
  • Responsible for entire data migration from Sybase ASE server to Oracle.
  • Migration of API code written for Sybase to Oracle.
  • Overlook the migration activity of PL/ SQL programs.
  • Migration of the PL/SQL code from Sybase to Oracle.
  • Migration of the data contained in the earlier ASPL Database from Sybase to Oracle.
  • Migrate the Libraries written using Sybase API's to Oracle's OCCI API's.
  • Automation of testing using Python.

Environment: Python, Java, JDBC, XML, PL/ SQL, SQL, web services.

Confidential

SQL Developer

Responsibilities:

  • Developed and deployed SSIS packages for ETL from OLTP and various sources to staging and staging to Data warehouse using For Each Loop Container, Execute Package task, Execute SQL Task, Sent Mail task, Lookup, Fuzzy Lookup, Derived Columns, Condition Split, Slowly Changing Dimension and more.
  • Extract, Transform and Load (ETL) source data into respective target tables to build the required data marts.
  • Involved in designing ETL as a part of Data warehousing and loaded data in to Fact tables using SSIS.
  • Supported Production Environment with schedule the packages and make the package dynamic with SQL Server Package Configuration.
  • Developed SSIS packages using for each loop in Control Flow to process all excel files within folder, File System Task to move file into Archive after processing and Execute SQL task to insert transaction log data into the SQL table.
  • Deployed SSIS Package into Production and used Package configuration to export various package properties to make package environment independent.
  • Developed queries or stored procedures using T - SQL to be used by reports to retrieve information from relational database and data warehouse
  • Worked extensively with Advance Analysis Actions, Calculations, Parameters, Background images and Maps.
  • Create Common Table expressions (CTE) and temp tables to facilitate the complex the queries
  • Generated and formatted Reports using Global Variables, Expressions and Functions for the reports. Designed and implemented stylish report layouts.
  • Developed Query for generating drill down and drill through reports in SSRS.
  • Designed new reports and wrote technical documentation, gathered requirements, analyzed data, developed and built SSRS reports and dashboard.
  • Developed SQL queries or stored procedures used by reports to retrieve information from relational database and data warehouse.

Environment: Microsoft SQL Server 2012, T-SQL, SQL SSIS, SSAS,

We'd love your feedback!