We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Mountain View, CA

SUMMARY:

  • A goal - oriented and enthusiastic professional with a versatile skill set and high-learning agility looking to utilize experience with large-scale, big data methods and data pipelines in a position as a Data Engineer with a world-class, high integrity company.
  • Holds a B.E. in Electrical & Electronics Engineering
  • Ingest data into HDFS from heterogenous sources like S3, Web using RestAPI calls, Database using BCP (SQL Server) & Oracle (Sqoop)
  • Design and Develop ETL using variety of tools like Pyspark, bash scripting, Hive to integrate various systems together
  • Possesses hands-on experience with building ETL data pipelines projects involving data ingestion from the web to Hadoop HDFS involving data formats CSV/Text/JSON and uploaded the data to a Hive Table
  • Big data experience includes a strong understating of the internals of Hadoop Ecosystem using Apache Sqoop, and Hadoop technologies in Cloudera 5.11; Sqoop to ingest data from Oracle SQL server; Oozie to schedule jobs
  • Fluent in Python and very good understanding of Data Structures and Algorithms.
  • Hands-on experience with large-scale, big data methods including Hadoop (worked with components including HDFS, Oozie), Spark, Hive (data transformation), Impala & Hue
  • Experience in schema modeling using Erwin and creating mapping document for ETL transformation and Load
  • Experience in handling structured and unstructured data and aim to provide clean and usable data
  • Worked in Amazon AWS S3 and EC2; worked as Database Architect and provided solutions for load balancing (Oracle RAC); very knowledgeable in horizontal and vertical scaling, memory management and disk maintenance
  • Used GIT for version control and JIRA/Cherwell for Ticketing
  • On-call Support for ETL job failures, provide RCA and recommend long term solution and meet SLA’s.
  • Collaborated with Data Scientists, Analysts & TPM's to understand the requirement and recommend ways to improve data reliability, efficiency and quality
  • Employs excellent problem-solving and verbal/written communication skills in all interactions
  • Demonstrated ability to explain technical concepts in a manner that is easily understood by non-technical professionals

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, PIG, Sqoop, Oozie, AWS S3, Spark SQL, Apache Hue, Cloudera Manager

Programming Languages: Python, Bash Script, SQL

Databases: Oracle, SQL Server, MongoDB

Operating Systems: Linux, Sun Solaris, AIX & Windows

Scheduling Tools: Oozie, Crontab

PROFESSIONAL EXPERIENCE:

Senior Big Data Engineer

Confidential - Mountain View, CA

  • Responsible for Omnichannel - Customer 360 project which gets the single complete actionable view of a customer.
  • Created Data lake by extracting data from heterogenous sources like flat files from vendors(csv/excel), databases using bcp/sqoop and web using RestAPI's (JSON)
  • Used Apache Hive/Impala to build ETL on HDFS data with dynamic partition tables for efficiency and provide data in requested format to build dashboards.
  • Build aggregate layers using hive/impala queries to implement business logic for weekly/Monthly reports
  • Design ETL using Internal/External tables and store in parquet format for efficiency.
  • Improved performance by partitioning, data physicalizing and using Hive parameters.
  • Developed scripts for process validation framework which sends exception reports for data quality and alerts for job failures
  • Developed business validation framework which validates source tables and dashboard views and report exception.
  • Data modeling using various techniques like dynamic partitioning, storing tables in parquet format for effective storage and retrieval of data from HDFS for performance
  • Used bash scripting, python to automate ETL and schedule ETL jobs in Oozie.
  • Used JIRA for ticketing and GIT for source control

Confidential - San Ramon, CA

Senior Data Architect

  • Created a data pipeline that ingested cyber-attack information from the web utilizing Python scripts; uploaded to Hive via Oozie scheduler for use by the Cybersecurity Analytics Team
  • Designed managed and external Hive Tables to store data efficiently for retrieval and compression in Parquet/AVRO/ORC formats
  • Contributed to the Database Schema design using Erwin, and developed hive tables in order to upload data from heterogeneous system including Oracle/SQL server/WEB
  • Partnered with Data Scientist team to automate machine learning model for Bank’s online account Attrition using Shell script.
  • Designed and executed Oozie workflows in a manner that allowed for scheduling Sqoop and Hive job actions to extract, transform and load data
  • Aided the Analytics team to derive a method for retrieving data from Oracle/SQL Server using complex ETL queries and imported in Hive using Sqoop full and incremental strategy and store in HDFS
  • Gained exposure and knowledge in PySpark dataframes and transformations for building machine learning models
  • Acquired experience in SQL Tuning and gained proficiency with Oracle Tuning features like SQL Profiling and SQL Hints
  • Teamed with the Data Analyst and Data Scientist on several joint endeavors
  • Became familiar with Machine Learning Algorithms (Linear, Logistics & Random Forest Algorithms)
  • Deploying data replication from SQL Server to Big Data Hive for Risk Analytics Team using Oracle GoldenGate.
  • Set up MongoDB Cluster with 3 replica sets with Sharding

Confidential - Foster City, CA

Database Engineer

  • Automated PROD and DR switchover using shell script; integrated with SRM

Confidential - San Jose, CA

Database Engineer

  • Installed and maintained R12 Oracle Applications for a Cisco International project
  • Maintained and updated the latest PSU's for Oracle database and Application software

We'd love your feedback!