We provide IT Staff Augmentation Services!

Apache Spark Consultant Resume

5.00/5 (Submit Your Rating)

Redwood City, CA

PROFESSIONAL SUMMARY:

  • 8 years of IT work experience and 3 years into Big Data Analytics that includes Analysis, Design, Development, Deployment & Maintenance of projects using Apache Hadoop technology stack - HDFS, MapReduce, YARN, Hive, Impala, Hue, Sqoop, Spark, Scala, Oozie and Hadoop APIs
  • Experience in data ingestion, processing and analysis using Spark with Scala, Sqoop and Shell Script and hive for storage
  • Efficient in developing Sqoop jobs for migrating data from RDBMS to Hive / HDFS and vice versa. Working experience in both historical and incremental data load
  • Thorough knowledge of Hadoop architecture and core components Name node, Data nodes, Job trackers, Task Trackers, Oozie, Hue, Flume, HBase, etc.
  • Very good experience of partitioning, bucketing concepts in Hive and designed both managed and External tables in Hive to optimize performance.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for data analysis.
  • Experience in extending HIVE and Spark core functionality by using Custom User Defined functions.
  • Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch
  • Experienced in working with Apache Spark ecosystem using Spark-SQL and Scala queries on different data file formats such as CSV, Parquet, ORC and Json
  • Experience in analyzing large scale data to identify new analytics, insights, trends, and relationships with a strong focus on data clustering.
  • An excellent team player with good organizational, interpersonal, communication skills and leadership qualities, Quick learner, possesses a positive attitude and flexibility towards the ever-changing industry.
  • Technically strong person who has capability to work with business users, project managers, team leads, architects and peers, thus maintaining healthy environment in the project.

TECHNICAL SKILLS:

Distributed Computing: Apache Hadoop 2.x, HDFS, YARN, MapReduce, HiveHBase, Sqoop, Zookeeper, Hue, Impala, Kafka, Spark

SDLC Methodologies: Agile Scrum

Relational Databases: Oracle, My SQL

Distributed Databases: No SQL - HBase

Distributed Filesystems: HDFS, Amazon S3

Distributed Query Engines: Hive, Spark SQL, Impala

Distributed Computing Environment: Cloudera, MapR

Operating Systems: Windows, Mac OS, Unix, Ubuntu

Programming: Java, Python, Scala, UNIX, Pig Latin, HiveQL

Scripting: Shell Scripting

Version Control: GitHub

IDE: Scala IDE, PyCharm

PROFESSIONAL EXPERIENCE:

Apache Spark Consultant

Confidential, Redwood City, CA

  • Collaborated with the business team to understand the requirements and baseline the data flow
  • Created scripts to cleanse and validate the data and further standardize it
  • Developed Spark/Python applications to create Data frames and RDDs to enrich the data including use of joins, filters, JSON extraction, filters, removing duplicates etc. for business logic processing.
  • Developed and used UDFs in Hive and Spark
  • Experience in extracting the data from Hadoop ecosystem to RDBMS using sqoop export
  • Created hive schemas using performance techniques like partitioning and bucketing.
  • Wrote extensive Hive queries to do transformations on the data to be used by downstream models
  • Migrated database application from sql server to Hadoop as part of a migration project. Used sqoop for history and incremental data load
  • Heavily used pyspark and python for data manipulation and transformation
  • Worked in Agile development environment

Environment: Cloudera CDH5.8, Hadoop, HBase, Spark, Python, Map Reduce, HDFS, Hive, SQOOP, Impala, Data Lake, Linux, Shell Scripting, Tableau, Oracle

Apache Hadoop & Spark Developer

Confidential, Minneapolis, MN

Responsibilities:

  • Transformed and analyzed large sets of data by running Hive queries and Apache Spark with Scala.
  • Used Spark RDDs, Data frames and Datasets for data transformation and processing.
  • Performed conversion of Hive/SQL queries into Spark SQL for better performance.
  • Developed Spark jobs using Scala for Batch analysis as per the business requirements.
  • Wrote efficient Hive queries with partitions and bucketing for performance improvement.
  • Developed Hive scripts to roll up the transactional data at subclass/location level to various attributes like week, month, year and compute aggregates for stock ledger and store them on HBase.
  • Developed UDFs and used them in Hive Queries.
  • Expertise in ingesting data from relational databases into HDFS and Hive tables using Sqoop.
  • Experience in handling various Hadoop file formats like Parquet, ORC, Avro, JSON and using various compression techniques.
  • Experience in writing Pig scripts for data cleansing.
  • Good understanding of HBase architecture and core concepts.

Confidential, Canton, MA

SQL BI Developer

Responsibilities:

  • Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc. on various database objects to obtain the required results by using T-SQL.
  • Actively involved in writing T-SQL Programming for implementing Stored Procedures and Functions and cursors, views for different tasks.
  • Experience in Error handling and debugging coding issues and troubleshoot production problems.
  • Involved in Performance tuning of ETL transformations, data validations and stored procedures.
  • Generated database monitoring and data validation reports in SQL Server Reporting Service (SSRS).
  • Designed and developed reports using SQL Server Reporting Services
  • Developed Stored Procedures and Views used to supply the Data to the Reports.
  • Developed different types of Tabular Reports, Matrix Reports (Cross Tab or Pivot) Reports, Chart and Graphs, Reports that aggregate data, Reports containing parameters and group, Reports containing Totals and Subtotals, Drill-down reports and ad-hoc reports using SSRS
  • Used Oracle, SQL, T-SQL and PL/SQL in test environment.
  • Configured, built, deployed different reports in PDF, Excel, TFI formats according to the business requirement using SQL Server 2012 Reporting Services.
  • Developed reports where the data extracted from SSAS cubes in SSRS.
  • Designing Cubes, Star Schema and Data models for Data Warehousing Applications
  • Built different types of chart reports using SSRS for the business requirement like Designed and deployed reports with Drill Down, Drill Through, drop down menu options, Sub-reports in SSRS using BIDS.
  • Report parameters included single valued parameters, emptied, grid that also consisted of different parameter types like hidden, internal, default (queried and non-queried parameters).
  • Created complex stored procedures to use as the datasets for the Report Design, to generate Ad hoc reports using SSRS
  • Developed Complex SSRS Reports involving Sub Reports (Matrix/Tabular Reports, Charts and Graphs, Bar charts, Column charts and Pie charts)

Environment: SQL Server, Management Studio, SQL, PL/SQL, Oracle 9i, ETL, Integration Services, Reporting Services, SSRS, Data ware Housing, ETL, T-SQL, Windows

We'd love your feedback!