We provide IT Staff Augmentation Services!

Hadoop / Spark Developer Resume

Louisville, KY

SUMMARY

  • Having 4+ years of experience in dealing with Apache Hadoop Components like MapReduce, HDFS, Hive, Sqoop, PIG, Kafka, Flume, Impala and Big Data Analytics.
  • Hands on Expertise on Scala development including Spark RDD and Data frame programming.
  • Strong experience with application migration from RDBMS to Hadoop.
  • Sound relational database concepts and extensively worked with DB2, Oracle. Expert in writing complex SQL queries and stored procs.
  • Experience with Real time streaming involving Apache Kafka and Spark Streaming.
  • Strong knowledge of Database architecture and Data Modeling including Hive and Oracle.
  • Excellent interpersonal and communication skills, technically competent and result - oriented with problem solving and leadership skills.
  • Sound understanding of Agile development and Agile Tools.
  • Experience of leading projects across verticals like Banking, Communications, Insurance, Retail & hospitality, Man-log.
  • Extensive knowledge in Cloud technologies like Microsoft Azure, AWS etc.

TECHNICAL SKILLS

Big Data: Hadoop, HDFS, MapReduce, Hive, Sqoop, Apache Spark, SparkSQL, Spark Streaming, HBase, YARN

Database: DB2, Oracle, SQL Server, MySQL, Hive

Hadoop Management: Cloudera Hadoop Distribution, HDInsight

Languages: SQL, Scala, Python, Shell Scripting

IDEs: IntelliJ, Eclipse, Maven, Bit Bucket

PROFESSIONAL EXPERIENCE

Confidential

Hadoop / Spark Developer

Responsibilities:

  • Load the data from SQL Server, Oracle RDBMS to Hive using Sqoop.
  • Create Hive tables to store the processed results in a tabular format.
  • Develop the Sqoop scripts to automate data load between RDBMS databases and Hadoop
  • Develop Apache spark based programs to implement complex business transformations
  • Develop Java custom record reader, partitioner and serialization techniques.
  • Use different data formats (Text, Avro, Parquet, JSON, ORC) while loading the data into HDFS.
  • Create Managed tables and External tables in Hive and loaded data from HDFS
  • Perform complex HiveQL queries on Hive tables for data profiling and reporting
  • Optimize the Hive tables using optimization techniques such as partitions and bucketing to provide better performance with HiveQL queries.
  • Use Hive to analyze partitioned and bucketed data and compute various metrics for reporting.
  • Create partitioned tables and loaded data using both static partition and dynamic partition method.
  • Create custom user defined functions in Hive to implement special date functions
  • Perform SQOOP import from Oracle to load the data in HDFS and directly into Hive tables.
  • Created and scheduled SQOOP Jobs for automated batch data load
  • Use JSON and XML SerDe Properties to load JSON and XML data into Hive tables.
  • Used SparkSQL and Spark Dataframe extensively to cleanse and integrate imported data into more meaningful insights.
  • Dealt with several source systems( RDBMS/ HDFS/S3) and file formats(JSON/ORC and Parquet) to ingest, transform and persist data in hive for further downstream consumption
  • Built Spark Applications using IntelliJ and Maven
  • Extensively worked on Scala programming language for Data Engineering using Spark
  • Scheduled spark jobs in production environment using Oozie scheduler.
  • Maintained Hadoop jobs (Sqoop/Hive and Spark) in production environment

Big Data POCs

Confidential

Responsibilities:

  • As part of Big Data adaptation journey, I participated in couple of Proof of Concepts. The POCs involve technical and performance assessment of Big Data Tech Stack (Sqoop, Hive and Spark)
  • As part of the POC program, moved a set of Oracle Tables to Hadoop and evaluated the data load process using Sqoop
  • Migrated associated business logic ( PL/SQL procedures/functions) to Apache Spark data frame modules
  • Created parallel Hive tables equivalent to Oracle tables and evaluated Hive Partitioning and Bucketing
  • Involved with Real-time Steaming POC to load customer behavior data in real time using Kafka and Spark Streaming. Customer web clickstream real-time data was simulated to evaluate Hadoop real-time ingestion and processing capability

Environment: Cloudera 5.8, Spark 2.0, HDFS, Map Reduce, Hive 2.0.1, Sqoop 1.4.6, Oozie Scheduler 4.3, YARN, Java, Linux Shell Scripting, Scala, Spark SQL, Impala 2.8, and Kafka.

Confidential

Hadoop Developer

Responsibilities:

  • Implemented a POC on Hadoop stack and different big data analytic tools, export and imports from Relational Databases to HDFS.
  • Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Created Hive Tables, loaded values and generated adhoc-reports using the table data.
  • Showcased strong understanding on Hadoop architecture including HDFS, MapReduce, Hive, Pig, Sqoop and Oozie.
  • Gathered business requirements in meetings for successful implementation and POC (Proof-of-Concept) of Hadoop Cluster.
  • Loaded existing data warehouse data from Oracle database to Hadoop Distributed File System (HDFS).
  • Developed Oozie workflows for automating Sqoop, Hive and Pig scripts.
  • Used to manage and review the Hadoop log files.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Installed and configured Pig and also written PigLatin scripts.
  • Involved in managing and reviewing Hadoop log files.
  • Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Written Hive queries for data analysis to meet the business requirements.
  • Creating Hive tables and working on them using Hive QL.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: Hadoop, Hive, Pig, Flume, Oracle, Java, HBase, Oozie, Shell scripting, Amazon EMR, Oracle

Confidential, Louisville, KY

Database Lead

Responsibilities:

  • DB2 Database design and manipulations.
  • Production database capacity monitoring & performance analysis
  • Performance tuning of DB2 SQL’s.
  • Capacity management of production DB2 objects
  • Reorgs, RUNSTATs, RTS updates - proactively and as needed
  • Monitoring large growing partitions and adjust key values or adding new partitions accordingly and schedule necessary maintenance
  • Data rebalancing in large sized partitions
  • Performance management of DB2 objects and engage with technical support teams in performance and problem resolution
  • Actively involved in client DR testing and DB2 version migration project
  • Brought in many automation things to improve overall system performance
  • Add/Delete/Rebuild Indexes for performance improvements
  • Database refresh from Prod to TEST/QA regions and data movement.

Environment: z/OS, JCL, IBM DB2 V8 and V9.1 on z/OS, IBM Admin tools

Confidential

DB2 SME

Environment: z/OS, JCL, IBM DB2 V8 and V9.1 on z/OS, IBM Admin tools

Responsibilities:

  • Database objects Creation, Alteration, drops.
  • Load data from Model office and Prod region to Dev region using xloads.
  • Perform DBA checkout during Version 9 migration project.
  • Execute database online reorganizations.
  • Participate in Plan specific Mock conversion activities.
  • Tune application queries on request.
  • Technical support to Application Development Team.
  • Review Application developer's Code.
  • Create Manage now tickets for Prod and MO region database critical changes.
  • Monitor DASD and raise request to extra volumes to MVS team.
  • Apply patches having data modeling changes to MTV databases.

Confidential

Sr. DB2 DBA

Environment : z/OS, JCL, IBM DB2 V7.1/V8.1 on Z/OS, BMC Master Mind tools

Responsibilities:

  • Database objects Creation, Alteration, drops
  • Controlling access to DB2 Objects
  • Implement and execute database backup
  • Execute database recovery when needed using Recovery and DSN1COPY
  • Execute database reorganizations (Online and Offline)
  • Resizing of Tablespaces
  • Tune application queries on request
  • Technical support for Application Development Team
  • Loading, Unloading Table spaces
  • Create partitioned tablespaces and table

Hire Now