We provide IT Staff Augmentation Services!

Big Data Developer Resume

5.00/5 (Submit Your Rating)

San Jose, CA

PROFESSIONAL SUMMARY:

  • 6+ years of overall professional work experience, Hadoop developer with 4+ years of experience in designing, implementing complete end to end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Scala, Hive, Oozie, Impala , HBase .
  • Good Experience in creating Data ingestion workflows , ETL frameworks, Data transformations, data management at an enterprise level.
  • Expert level understanding of Hadoop HDFS , SPARK and Map Reduce internals .
  • Knowledge on RDBMS databases like Oracle, SQL Server, MySQL and DB2 .
  • Experience in analyzing data using HiveQL, HBase and custom Map Reduce programs in Java.
  • Good knowledge about YARN configuration.
  • Good understanding on Cloud Based technologies such as GCP, AWS.
  • Hands on Experience on Snowflake and GCP.
  • Experience in shell and python scripting languages.
  • Knowledge of NoSQL databases such as HBase
  • Working knowledge in multi - tiered distributed environment, OOAD concepts, good understanding of Software Development Lifecycle ( SDLC ).
  • Proficient in development methodologies such as Agile, Scrum and Waterfall.
  • Have experience in Apache Spark , and Spark SQL .
  • Basic working knowledge on ELK (Elastic search, Logstash, Kibana) Stack.
  • Used Data frame API in Apache spark to load CSV , JSON and Parquet files.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce , Hive , Spark and Shell jobs .
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Good knowledge on building Apache Spark applications using Scala and python .
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase , Mongodb , Cassandra .
  • Experience in Cloudera, Mapr and Horton works distributions.
  • Knowledge on Cloudera Navigator for data discovery, continuous optimization, audit, lineage, metadata management, and policy enforcement.

TECHNICAL SKILLS:

Programming Languages: SQL, C, Java, Scala, HTML, Verilog, VHDL, Python

Big Data Technologies: Hadoop, MapReduce, Spark, Oozie, Hive, Pig, Scoop, HBase.

Databases: MySQL, MSSQL, Oracle, HBase, MongoDB.

Software tools: Eclipse IDE, MATLAB, Xilinx, Micro Controller, Power BI, Postman

Operating Systems: Windows, UNIX/Linux.

Methodologies: Scrum, Agile, Waterfall.

PROFESSIONAL EXPERIENCE:

Confidential, San Jose, CA

Big Data Developer

Responsibilities:

  • Involved in Requirement gathering for collecting all the required data and push into Big Data environment. worked on design and development of data engineering pipelines using Spark/scala and Hive.
  • Used Cscope, tags, Code Query, SQLite to analyze Source code data and generating data on function relationships on 400 GB source code files data.
  • Developed Spark Scala scripts to establish function-file interaction mapping from code query generated datasets.
  • Used Sqoop for hive data ingestion from Oracle Database.
  • Analyzed and processed bug and enclosure data.
  • Processed large dump files and extracted key commit level information.
  • Developed PySpark script to load data to MongoDB using spark mongo connector.
  • Developed python scripts to find the KLOC (kilo line of code) from enclosures for each bug.
  • Worked on performance tuning of Spark/Yarn configurations to get the Optimal performance and reduce the number of Shuffle operations.
  • Standardized Hive Tables using ORC and Parquet Storage formats.
  • Developed a python script to update regression flag to CDETS (Cisco Defect and Enhancement Tracking System) using write back API.
  • Developed a POC for project migration from on prem Hadoop MapR system to GCP/Snowflake.
  • Used Tidal for job Scheduling.
  • Used GIT as code repository.

Confidential, Phoenix, AZ

Big Data Developer

Responsibilities:

  • Worked on development and modification of Existing EDP (Batch & API) Application and onboarding in Arena which is Enterprise wide Integration Testing Platform across.
  • Handled importing other enterprise data from different data sources and performing transformations using Hive, Spark and Map Reduce and then loading data into HBase tables.
  • Implemented business logic for PII encryption and decryption in Hive using Generic UDF's.
  • Developed Hive scripts & Spark program to read, transform and store the structured data from various sources into Hadoop file system while ensuring different aspects including performance, data quality and security of PII data etc.
  • Developed UDF’s using both Data Frames/SQL and RDD in Spark for data aggregation queries.
  • Assigned the task of Quality Assurance of entire EDP module.
  • Involved in moving all log files generated from various sources using Splunk.
  • Used Oozie for workflow orchestration and scheduling to manage Hadoop Jobs.
  • Worked on EDP API CSRT (Cornerstone Real Time) onboarding.
  • Worked on HBase table structure (rowkeys, column families, columns etc.) and ingested data into HBase tables.
  • Supported EDP,CMPH and Deviceid Applications.
  • Deployed Applications using BDEC Jenkins, CI Jenkins and XL Release in EPAAS cloud and Tomcat servers .
  • Used Apache JMeter for load test of API’s. An automation framework was developed for testing of the batch process.
  • Used Bitbucket as code repository.

Confidential, Long Beach, CA

Big Data Developer

Responsibilities:

  • Responsible in designing and building a centralized Core Data Platform (CDP) resembling a Data Lake with all cross-sourced datasets of Confidential making them available for enterprise-wide use at a single place.
  • Built spark pipelines on top of CDP for Data Analytics applications.
  • Designed near-real time Analytics and Ingestion platform using Spark and Scala to Load Large datasets in Hive/HBase and to reduce the time taken by our regular Talend process.
  • Used HBase Bulk Load for Larger Datasets and HBase-Java API for incremental Loads.
  • Partitioned Hive Tables and used file formats such as Parquet , Avro for optimized storage patterns.
  • Used Cloudera 5.13 Hadoop distribution.
  • Handled Large datasets (over 1 Billion records) such as medical claims, pharmacy claims and built a semantic layer on top of all claims in Molina using Spark and Scala .
  • Built Mosaic application, which collects all member phone numbers from different Sources ( Hive ) and this final Dataset( hive ) is consumed by front end application using Impala , which helps the company to maintain a unified directory of all its members.
  • Worked on PDI (Provider Data Integrity) application, which is used to clean up the provider data.
  • Built Tiering Analytics application, which tier’s all the providers associated with Molina into levels of 0,1,2 and 3. based on performance of the provider using Financial and Quality metrics. This data helps the company to auto Assign the provider to new members.
  • Worked on performance tuning of Spark/Yarn configurations to get the Optimal performance and reduce the number of Shuffle operations.
  • Used Power BI to generate reports and perform analytic operations out of the pipeline datasets created. Impala is used for the power BI operations.
  • Developed a Scala application for daily tracking and status monitoring of all the pipeline runs.
  • Built operation controls on top of pipelines such as error handling , Logging and Email Notifications .
  • Worked on State Plan reporting from data lake for a new state implementation

Confidential, Atlanta, GA

Big Data Developer

Responsibilities:

  • Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
  • Involved in data Ingestion in to the Data Lake using Abinitio .
  • Used Cloudera 5.9 hadoop distribution.
  • Designed and implemented Dynamic discovery component which automatically detects changes in source specific metadata from HDFS and maintains this information in Hbase .
  • Developed a generic schema generation module to create flattened Hive parquet tables based on the Avro schema.
  • Partitioned data present in HDFS using Hive .
  • Used Databricks spark-Avro package to read/parse the Avro files and custom changes are implemented in the Databricks’ spark-Avro source code to convert the logical type to their actual data type.
  • Involved in transformation and flattening of Avro data and load it in to hive table as parquet format by using spark data frames.
  • Developed a component which maintains HBase based Audit table for Job audit, table level processed status (Success/Failed) and daily reporting.
  • Implemented Schema evaluation to add/delete columns in the hive table based on source system Avro schema changes.
  • Used spark SQL in Lake Process job, as it presents the ability to combine easily loading and querying data simultaneously.
  • Built a scheduling framework using Oozie coordinator and YARN priority queue for Lake processing.
  • Oozie workflows were dynamically generated using Scala code to launch multiple spark jobs simultaneously.
  • Used Hbase for maintaining Metadata, Audit and runtime processing information.
  • Developed many spark applications using Scala for Data Lake Transitions
  • Created Hive generic UDF for masking sensitive columns based on metadata present in Hbase table.
  • Audit information was generated for SOX compliance

Confidential, Cary, NC

Software Analyst

Responsibilities:

  • Developed many MapReduce jobs, to pre-process the data in the HDFS.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL) .
  • Partitioned the data using Hive in ORC format.
  • Involved in creating Hive tables, loading with data and writing hive queries
  • Used ORC, Sequence, Avro, Parquet file formats to improve the performance of Hive queries
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Involved in loading data from UNIX file system to HDFS .
  • Used Hive UDF’ s to perform complex transformations.
  • Developed scripts and Batch Job to schedule various Hadoop Program
  • Implemented the workflows using Oozie for running MapReduce jobs.
  • Hands on experience working on Structured, Semi structured and unstructured data.
  • Used JIRA for trouble tickets and Confluence for our knowledge base
  • Exposure to Struts, spring, Hibernate, and Faces frameworks.
  • Exposure and some development of the following: Enterprise Java Beans, Servlets, JSP, JSF, JavaScript, JQuery, Oracle, HTML, and CSS.
  • Used commons and log4j logging framework.

We'd love your feedback!