Hadoop / Spark Developer Resume Louisville, KY - Hire IT People

SUMMARY

Having 4+ years of experience in dealing with Apache Hadoop Components like MapReduce, HDFS, Hive, Sqoop, PIG, Kafka, Flume, Impala and Big Data Analytics.
Hands on Expertise on Scala development including Spark RDD and Data frame programming.
Strong experience with application migration from RDBMS to Hadoop.
Sound relational database concepts and extensively worked with DB2, Oracle. Expert in writing complex SQL queries and stored procs.
Experience with Real time streaming involving Apache Kafka and Spark Streaming.
Strong knowledge of Database architecture and Data Modeling including Hive and Oracle.
Excellent interpersonal and communication skills, technically competent and result - oriented with problem solving and leadership skills.
Sound understanding of Agile development and Agile Tools.
Experience of leading projects across verticals like Banking, Communications, Insurance, Retail & hospitality, Man-log.
Extensive knowledge in Cloud technologies like Microsoft Azure, AWS etc.

TECHNICAL SKILLS

Big Data: Hadoop, HDFS, MapReduce, Hive, Sqoop, Apache Spark, SparkSQL, Spark Streaming, HBase, YARN

Database: DB2, Oracle, SQL Server, MySQL, Hive

Hadoop Management: Cloudera Hadoop Distribution, HDInsight

Languages: SQL, Scala, Python, Shell Scripting

IDEs: IntelliJ, Eclipse, Maven, Bit Bucket

PROFESSIONAL EXPERIENCE

Confidential

Hadoop / Spark Developer

Responsibilities:

Load the data from SQL Server, Oracle RDBMS to Hive using Sqoop.
Create Hive tables to store the processed results in a tabular format.
Develop the Sqoop scripts to automate data load between RDBMS databases and Hadoop
Develop Apache spark based programs to implement complex business transformations
Develop Java custom record reader, partitioner and serialization techniques.
Use different data formats (Text, Avro, Parquet, JSON, ORC) while loading the data into HDFS.
Create Managed tables and External tables in Hive and loaded data from HDFS
Perform complex HiveQL queries on Hive tables for data profiling and reporting
Optimize the Hive tables using optimization techniques such as partitions and bucketing to provide better performance with HiveQL queries.
Use Hive to analyze partitioned and bucketed data and compute various metrics for reporting.
Create partitioned tables and loaded data using both static partition and dynamic partition method.
Create custom user defined functions in Hive to implement special date functions
Perform SQOOP import from Oracle to load the data in HDFS and directly into Hive tables.
Created and scheduled SQOOP Jobs for automated batch data load
Use JSON and XML SerDe Properties to load JSON and XML data into Hive tables.
Used SparkSQL and Spark Dataframe extensively to cleanse and integrate imported data into more meaningful insights.
Dealt with several source systems( RDBMS/ HDFS/S3) and file formats(JSON/ORC and Parquet) to ingest, transform and persist data in hive for further downstream consumption
Built Spark Applications using IntelliJ and Maven
Extensively worked on Scala programming language for Data Engineering using Spark
Scheduled spark jobs in production environment using Oozie scheduler.
Maintained Hadoop jobs (Sqoop/Hive and Spark) in production environment

Big Data POCs

Confidential

Responsibilities:

As part of Big Data adaptation journey, I participated in couple of Proof of Concepts. The POCs involve technical and performance assessment of Big Data Tech Stack (Sqoop, Hive and Spark)
As part of the POC program, moved a set of Oracle Tables to Hadoop and evaluated the data load process using Sqoop
Migrated associated business logic ( PL/SQL procedures/functions) to Apache Spark data frame modules
Created parallel Hive tables equivalent to Oracle tables and evaluated Hive Partitioning and Bucketing
Involved with Real-time Steaming POC to load customer behavior data in real time using Kafka and Spark Streaming. Customer web clickstream real-time data was simulated to evaluate Hadoop real-time ingestion and processing capability

Environment: Cloudera 5.8, Spark 2.0, HDFS, Map Reduce, Hive 2.0.1, Sqoop 1.4.6, Oozie Scheduler 4.3, YARN, Java, Linux Shell Scripting, Scala, Spark SQL, Impala 2.8, and Kafka.

Confidential

Hadoop Developer

Responsibilities:

Implemented a POC on Hadoop stack and different big data analytic tools, export and imports from Relational Databases to HDFS.
Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Created Hive Tables, loaded values and generated adhoc-reports using the table data.
Showcased strong understanding on Hadoop architecture including HDFS, MapReduce, Hive, Pig, Sqoop and Oozie.
Gathered business requirements in meetings for successful implementation and POC (Proof-of-Concept) of Hadoop Cluster.
Loaded existing data warehouse data from Oracle database to Hadoop Distributed File System (HDFS).
Developed Oozie workflows for automating Sqoop, Hive and Pig scripts.
Used to manage and review the Hadoop log files.
Responsible to manage data coming from different sources.
Supported Map Reduce Programs those are running on the cluster.
Installed and configured Pig and also written PigLatin scripts.
Involved in managing and reviewing Hadoop log files.
Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
Developing Scripts and Batch Job to schedule various Hadoop Program.
Written Hive queries for data analysis to meet the business requirements.
Creating Hive tables and working on them using Hive QL.
Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: Hadoop, Hive, Pig, Flume, Oracle, Java, HBase, Oozie, Shell scripting, Amazon EMR, Oracle

Confidential, Louisville, KY

Database Lead

Responsibilities:

DB2 Database design and manipulations.
Production database capacity monitoring & performance analysis
Performance tuning of DB2 SQL’s.
Capacity management of production DB2 objects
Reorgs, RUNSTATs, RTS updates - proactively and as needed
Monitoring large growing partitions and adjust key values or adding new partitions accordingly and schedule necessary maintenance
Data rebalancing in large sized partitions
Performance management of DB2 objects and engage with technical support teams in performance and problem resolution
Actively involved in client DR testing and DB2 version migration project
Brought in many automation things to improve overall system performance
Add/Delete/Rebuild Indexes for performance improvements
Database refresh from Prod to TEST/QA regions and data movement.

Environment: z/OS, JCL, IBM DB2 V8 and V9.1 on z/OS, IBM Admin tools

Confidential

DB2 SME

Environment: z/OS, JCL, IBM DB2 V8 and V9.1 on z/OS, IBM Admin tools

Responsibilities:

Database objects Creation, Alteration, drops.
Load data from Model office and Prod region to Dev region using xloads.
Perform DBA checkout during Version 9 migration project.
Execute database online reorganizations.
Participate in Plan specific Mock conversion activities.
Tune application queries on request.
Technical support to Application Development Team.
Review Application developer's Code.
Create Manage now tickets for Prod and MO region database critical changes.
Monitor DASD and raise request to extra volumes to MVS team.
Apply patches having data modeling changes to MTV databases.

Confidential

Sr. DB2 DBA

Environment : z/OS, JCL, IBM DB2 V7.1/V8.1 on Z/OS, BMC Master Mind tools

Responsibilities:

Database objects Creation, Alteration, drops
Controlling access to DB2 Objects
Implement and execute database backup
Execute database recovery when needed using Recovery and DSN1COPY
Execute database reorganizations (Online and Offline)
Resizing of Tablespaces
Tune application queries on request
Technical support for Application Development Team
Loading, Unloading Table spaces
Create partitioned tablespaces and table

We provide IT Staff Augmentation Services!

Hadoop / Spark Developer Resume

Louisville, KY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship