Senior Big Data Engineer Resume
4.00/5 (Submit Your Rating)
Mountain View, CA
SUMMARY:
- A goal - oriented and enthusiastic professional with a versatile skill set and high-learning agility looking to utilize experience with large-scale, big data methods and data pipelines in a position as a Data Engineer with a world-class, high integrity company.
- Holds a B.E. in Electrical & Electronics Engineering
- Ingest data into HDFS from heterogenous sources like S3, Web using RestAPI calls, Database using BCP (SQL Server) & Oracle (Sqoop)
- Design and Develop ETL using variety of tools like Pyspark, bash scripting, Hive to integrate various systems together
- Possesses hands-on experience with building ETL data pipelines projects involving data ingestion from the web to Hadoop HDFS involving data formats CSV/Text/JSON and uploaded the data to a Hive Table
- Big data experience includes a strong understating of the internals of Hadoop Ecosystem using Apache Sqoop, and Hadoop technologies in Cloudera 5.11; Sqoop to ingest data from Oracle SQL server; Oozie to schedule jobs
- Fluent in Python and very good understanding of Data Structures and Algorithms.
- Hands-on experience with large-scale, big data methods including Hadoop (worked with components including HDFS, Oozie), Spark, Hive (data transformation), Impala & Hue
- Experience in schema modeling using Erwin and creating mapping document for ETL transformation and Load
- Experience in handling structured and unstructured data and aim to provide clean and usable data
- Worked in Amazon AWS S3 and EC2; worked as Database Architect and provided solutions for load balancing (Oracle RAC); very knowledgeable in horizontal and vertical scaling, memory management and disk maintenance
- Used GIT for version control and JIRA/Cherwell for Ticketing
- On-call Support for ETL job failures, provide RCA and recommend long term solution and meet SLA’s.
- Collaborated with Data Scientists, Analysts & TPM's to understand the requirement and recommend ways to improve data reliability, efficiency and quality
- Employs excellent problem-solving and verbal/written communication skills in all interactions
- Demonstrated ability to explain technical concepts in a manner that is easily understood by non-technical professionals
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, PIG, Sqoop, Oozie, AWS S3, Spark SQL, Apache Hue, Cloudera Manager
Programming Languages: Python, Bash Script, SQL
Databases: Oracle, SQL Server, MongoDB
Operating Systems: Linux, Sun Solaris, AIX & Windows
Scheduling Tools: Oozie, Crontab
PROFESSIONAL EXPERIENCE:
Senior Big Data Engineer
Confidential - Mountain View, CA
- Responsible for Omnichannel - Customer 360 project which gets the single complete actionable view of a customer.
- Created Data lake by extracting data from heterogenous sources like flat files from vendors(csv/excel), databases using bcp/sqoop and web using RestAPI's (JSON)
- Used Apache Hive/Impala to build ETL on HDFS data with dynamic partition tables for efficiency and provide data in requested format to build dashboards.
- Build aggregate layers using hive/impala queries to implement business logic for weekly/Monthly reports
- Design ETL using Internal/External tables and store in parquet format for efficiency.
- Improved performance by partitioning, data physicalizing and using Hive parameters.
- Developed scripts for process validation framework which sends exception reports for data quality and alerts for job failures
- Developed business validation framework which validates source tables and dashboard views and report exception.
- Data modeling using various techniques like dynamic partitioning, storing tables in parquet format for effective storage and retrieval of data from HDFS for performance
- Used bash scripting, python to automate ETL and schedule ETL jobs in Oozie.
- Used JIRA for ticketing and GIT for source control
Confidential - San Ramon, CA
Senior Data Architect
- Created a data pipeline that ingested cyber-attack information from the web utilizing Python scripts; uploaded to Hive via Oozie scheduler for use by the Cybersecurity Analytics Team
- Designed managed and external Hive Tables to store data efficiently for retrieval and compression in Parquet/AVRO/ORC formats
- Contributed to the Database Schema design using Erwin, and developed hive tables in order to upload data from heterogeneous system including Oracle/SQL server/WEB
- Partnered with Data Scientist team to automate machine learning model for Bank’s online account Attrition using Shell script.
- Designed and executed Oozie workflows in a manner that allowed for scheduling Sqoop and Hive job actions to extract, transform and load data
- Aided the Analytics team to derive a method for retrieving data from Oracle/SQL Server using complex ETL queries and imported in Hive using Sqoop full and incremental strategy and store in HDFS
- Gained exposure and knowledge in PySpark dataframes and transformations for building machine learning models
- Acquired experience in SQL Tuning and gained proficiency with Oracle Tuning features like SQL Profiling and SQL Hints
- Teamed with the Data Analyst and Data Scientist on several joint endeavors
- Became familiar with Machine Learning Algorithms (Linear, Logistics & Random Forest Algorithms)
- Deploying data replication from SQL Server to Big Data Hive for Risk Analytics Team using Oracle GoldenGate.
- Set up MongoDB Cluster with 3 replica sets with Sharding
Confidential - Foster City, CA
Database Engineer
- Automated PROD and DR switchover using shell script; integrated with SRM
Confidential - San Jose, CA
Database Engineer
- Installed and maintained R12 Oracle Applications for a Cisco International project
- Maintained and updated the latest PSU's for Oracle database and Application software