Sr Data Engineer Resume
Ridgefield, CT
SUMMARY
- 14+ years of cumulative professional experience in information technology with an expert hand in the areas of big data, Hadoop, spark, hive, impala, sqoop, flume, kafka, SQL tuning, ETL development, report development, database development, data modeling and strong knowledge of oracle database architecture.
- Expertise in data engineering, architecture and big data architecture
- Expert knowledge and experience in fact dimensional modeling (Star schema, Snow flake schema), transactional modeling and SCD (Slowly changing dimension)
- Hands of experience in GCP, bigquery, GCS bucket, g - cloud function, pub/sub, cloud dataflow, cloud shell, gsutil, bq command line utilities, dataproc, stackdriver
- Well knowledge and experience in cloudera ecosystem (HDFS, YARN, Hive, sqoop, flume, Hbase, Oozie, Kafka, Pig), Data pipeline, data analysis and processing with hive sql, impala, spark, spark sql
- Using Flume, kafka and spark streaming to ingest real time or near real time data in HDFS
- Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing
- Hands on experience in building end to end data lake
- Programming experience with python and scala
- Hands in experience on No SQL database like Hbase, Cassandra
- Exposure to Redshift as an alternative of Oracle. Analyzing the way to migrate oracle database to redshift
- Maintaining of AWS IAM and VPC for RDS and EC2
- Provide database architectural solution based on project requirement
- Proficient in achieving oracle SQL plan stability, maintaining baselines with SQL plans, ASH, AWR, ADDM, Sql Advisor for pro-active follow up and SQL rewrites
- Experience on Shell scripting to automate various activities
- Application development with oracle forms and report with OBIEE, discoverer, report builder and ETL development with OWB
PROFESSIONAL EXPERIENCE
Confidential
Sr Data Engineer
Responsibilities:
- Taking the lead role on building and architecting multiple data pipelines, end to end ETL and ELT process for data ingestion and transformation in GCP and coordinate task among the team.
- Design and architect various layer of datalake
- Design star schema in bigquery
- Loading salesforce data every 15 min on incremental basis to bigquery raw and udm layer using SOQL, Google dataproc, GCS bucket, hive, spark, scala, python, gsutil and shell script.
- Using rest api with python to ingest data from fleetComplete (Location tracking software for moving element) and some other site to bigquery.
- Using g-cloud function with python to load data in to bigquery for on arrival csv files in GCS bucket.
- Write a program to download a SQL dump from there equipment maintenance site and then load it in GCS bucket. On the other side load this sql dump from GCS bucket to mysql (hosted in google cloud sql) and load the data from mysql to bigquery using python, scala, spark and dataproc.
- Process and load bound and unbound data from google pub/sub topic to bigquery using cloud dataflow with python
- Build a program with python and apache beam and execute it in cloud dataflow to run data validation between raw source file and bigquery tables
- Building a scala and spark based configurable framework to connect common data sources like mysql, oracle, postgres, sql server, salesforce, bigquery and load it in bigquery as is or transformed
- Monitoring bigquery, dataproc and cloud data flow jobs via stackdriver for all the environment(dev, uat and prod)
- Open ssh tunnel to google dataoroc to access to yarn manager to monitor spark jobs
- Submit spark jobs using gsutil and spark submission get it executed in dataproc cluster
- Create firewall rules to access google data proc from other machines
- Write a python program to maintain raw file archival in GCS bucket
- Analyze various type of raw file like json, csv, xml with python using pandas, numpy etc
- Write scala program for spark transformation in dataproc
Tools: /Technologies: GCP, bigquery, GCS bucket, g-cloud function, apache beam, pub/sub, cloud dataflow, cloud shell, gsutil, bq command line utilities, dataproc, vm instances, cloud sql, mysql, posgres, sql server, salesforce soql, python, scala, spark, hive, spark-sql, sqoop.
Confidential, Ridgefield, CT
Lead Data Engineer
Responsibilities:
- Architect the data pipeline for each client and end to end data lake design
- Analyzing client data using scala, spark, spark SQL and define an end to end data lake presentation towards the team
- Design the transformation layers to write the ETL using scala and spark and distribute among the team including me.
- Keep the team motivated to deliver the project on time and work side by side with other members as a team member
- Design and develop spark job with scala to implement end to end data pipeline for batch processing
- Do fact dimensional modeling and proposed solution to load it
- Processing data with scala, spark, spark SQL and load in hive partition tables in parquet file format
- Develop spark job with partitioned RDD (like hash, range, custom) for faster processing
- Develop and deploy the outcome using spark and scala code in Hadoop cluster running on AWS EC2 and AWS EMR
- Develop near real time data pipeline using flume, Kafka and spark stream to ingest client data from their web log server and apply transformation
- Develop sqoop script and sqoop job to ingest data from client provided database in batch fashion on incremental basis
- Use distcp to load files from S3 to hdfs and Processing, cleansing and filtering data using scala, spark, spark sql, hive, impala query and load in hive tables for data scientists to apply their ML algorithms and generate recommendations as part of data lake processing layer.
- load recommendations in our internal oracle database using sqoop from hive tables for consumed by company’s web api and display it in company web portal as part of data lake visualization layer.
- Define the data pipeline for various clients
- Building part of oracle database in redshift
- Loading data in No SQL database (Hbase, Cassandra)
- Combine all the above steps in oozie workflow to run the end to end ETL process
- Using YARN in cloudera manager to monitor job processing
- Developing under scrum methodology and in a CI/CD environment using jenkin
- Maintain the entire oracle DB infrastructure in AWS RDS and EC2 including storage, availability, performance and upgrade.
- Do participate in architecture council for database architecture recommendation
- Do database design and development based of AI algorithms with direct guidance from chief data scientist
- Develop advance level PL/SQL and SQL development to help data scientists
- Deep analysis on SQL execution plan and recommend hints or restructure or introduce index or materialized view for better performance
- Deploy EC2 instances for oracle database
- Maintain IAM and VPC for all the RDS and EC2
- Some exposure to jenkin and ansible
Tools: /Technologies: Hadoop ecosystem ( HDFS, YARN, Pig, Hive, sqoop, flume, Oozie, Kafka, hive sql, impala, spark, scala, python, Hbase, Cassandra, EC2, EBS volume, VPC, S3, Oracle 12c, Oracle Enterprise Linux, Shell scripting
Confidential, NY
Sr Oracle database Developer/Architect
Responsibilities:
- Taking the lead role to Architect and Support all the database for data core and SESIS projects all the way from development to production based on project requirement
- Participate in database design session for both OLTP and OLAP to guide about oracle database advanced features and the correct path to achieve maximum performance
- Develop logical and physical fact dimensional and transactional modeling for data core’s DWH, data mart and ODS and present to higher management
- Architect the ETL batch pipeline to consume data from 20+ legacy systems
- Build data core’s DWH, data mart and ODS from scratch
- Work directly with the application team to develop efficient query to achieve maximum performance
- Analyze the execution plan of critical sql statements to identify the best way to achieve maximum performance
- Tune badly performed SQL statements in test environment before going production
- Identify and resolve latch, disk i/o contention and locking
- Generate performance report and do reactive and proactive performance analysis using oracle provided tools like AWR, ADDM, ASH, SQL and other Advisors
- Using OS level utilities (vmstat, iostat, mpstat, sar, strace, etc) to diagnosis performance related issues
- Design security solution to protect sensitive data
- Work side by side with ETL, .net, java, PL/SQL developers to identify performance bottle necks
- Design suitable partition table with partition index, materialized view and proper indexing methods to achieve maximum performance
- Deep analysis on sql execution plan and PL/SQL to identify performance bottleneck
- Maintain VPD (Virtual Private database)
- Materialized view implementation to synchronize data with another environment and enable rewrite to make aggregated sql’s faster
- Develop system level stored procedure to automatically defrag fragmented tables, automatically rebuild index, automate purging policy
- Develop Shell scripting for process automation
- Following agile methodology
Tools: /Technologies: Erwin, Oracle 11gR2 RAC, RMAN, Oracle Goldengate, Oracle Enterprise Linux, Data-Guard, AWR, ADDM, Flashback, Shell scripting, Oradebug, Oracle 11g, 10g etc
Confidential, Parsippany, NJ
Sr Oracle Developer
Responsibilities:
- Coordinate with the global team to identify the performance bottle necks
- Database tuning with the standard diagnostic tools (i.e. AWR, ASH, ADDM, oracle wait events, SQL Advisor and other advisors etc
- Optimize database performance by introducing index, partition table, and materialized view by using ORACLE 11g new features
- Proactively monitor all the expensive sql statement all over the environment
- Creating and maintaining partition table
- Participate in the Design and development session for PL/SQL process flow to guide the team to gain maximum performance
- Identify the process tables that are candidate of partitioning.
- Heavy SQL tuning
- Identify and resolve latch contention, disk i/o contention and locking
- Develop and tune PL/SQL process
- Load data by using SQL* Loader and External Tables
- Database tuning with standard diagnostic tools (i.e. AWR, ADDM, sql advisor and other advisors).
- Participate in database physical implementation from development to production.
- Using OS level utilities (vmstat, iostat, mpstat, sar, strace, etc) to diagnosis performance related issues
Technology: Erwin, Oracle Goldengate, RMAN, Sun Solaris 10, Grid Control, Linux, Data-Guard, AWR, Flashback, Shell scripting, Oracle 11g, 10g etc
Confidential, Parsippany, NJ
Sr Database Developer/Architect
Responsibilities:
- Develop Dimensional and transactional model for ODS and DWH for Product information management
- Develop PL/SQL process to establish batch ETL pipeline load data from PIM MDM system to Confidential ODS and DWH to be consumed by www. Confidential .com and all kind BI reporting
- Deeply analyze sql execution plan and tune for badly performed query
- Work closely with application developers and architects to guide them through oracle advanced features and oracle performance tuning related features
- Re-indexing fragmented object
- Proactively monitor all the expensive sql statement all over the environment
- Generate performance report of server and Oracle database using Statspack, AWR, ADDM, ASH, Sql advisor and other advisors
- Shell scripting and job scheduling with crontab
- Working with development team for Tuning SQL and PL/SQL to enhance performance
- Using OS level utilities (vmstat, iostat, mpstat, sar, strace, etc) to diagnosis performance related issues
Technology: Erwin, Oracle, Stream, RMAN, Sun Solaris 10, Grid Control, Linux, Data-Guard, AWR, Flashback, Shell scripting, Oracle 11g, 10g etc
Confidential
Team lead DWBI (Data warehouse and Business Intelligence)
Responsibilities:
- Lead a team of 15 people
- Do dimensional and operational data modeling for the entire project
- Develop the database architectural solution for staging, ODS and data mart
- Directly participate with the developers to develop the critical PL/SQL process
- Monitoring ADDM, AWR, ASH, sql advisor and other advisors for overall database performance
- Effectively work on SQL performance tuning
- Work closely with the developers to develop advanced level PL/SQL and SQL
- Participate and guide in architect, design and development of ETL maps with OWB to share the workload
- Develop BI reports using OBIEE to share the workload
- Full guidance to develop the control interface with oracle forms developer
- Coordinate on requirement analysis
- Using OS level utilities (vmstat, iostat, mpstat, sar, strace, etc) to diagnosis performance related issues
Technology: Oracle 11gR1, HP-UX, OBIEE 10g, OWB 10g, Shell scripting, Oracle PL/SQL, RMAN, AWR, STA, CBO, MS Visio, Oracle Forms Developer 10g
Confidential
Oracle Database Engineer
Responsibilities:
- Participate in end to end dimensional data modeling discussion
- Analyze source data and reporting requirement to build the ODS, EDW and data mart architecture
- Participate in ETL architecture discussion sessions
- Develop ETL process to load all-star and snowflake schema in ODS, EDW and data mart Tune
- Take part in developing advanced PL/SQL, complex sql, OWB ETL maps, BI reports with OBIEE
- Creating and maintaining the partition table.
- Develop guidelines and operational policies and procedures
- Review SQL performance and Tuning
- Cross platform data transfer by using oracle external table and data pump
- Monitoring ADDM, AWR, ASH sql advisor and other advisors for overall database performance.
- Monitor database servers with os utilities like vmstat, iostat, mpstat, sar, strace, etc
Technology: Oracle 10g, Sun Solaries, Shell scripting, Oracle PL/SQL, RMAN, AWR, STA, CBO, RAC, MS Visio, OWB 10g, OBIEE 10g
Confidential
Developer
Responsibilities:
- Requirement analysis to identify attributes and entity
- Participate in transactional data modeling session with team lead
- Develop required forms and reports for multiple modules (Engineering, inventory, purchase, procurement, HRMS, Accounts etc)
- Use SQL*Loader for Data loading from flat files for engineering module
- Build libraries for forms developer
- Develop reports with report builder
- Develop necessary PL/SQL
Technology: Oracle 9i, 10g, oracle 6i and 10g forms and reports, Sun solaries, Shell scripting, Oracle PL/SQL, MS Visio.