Sr. Data/ Hadoop Engineer Resume
4.00/5 (Submit Your Rating)
Beaverton, OR
SUMMARY
- Over 9 years of extensive experience in the field of information technology industry, providing strong business solutions with excellent technical, communication and customer service expertise.
- Over 3 years of extensive experience in Hadoop, Spark and Big data Technologies
- Strong Experience in using Hadoop eco - system components like HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Impala.
- Strong Experience in AWS Cloud services like EC2 and S3
- Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
- Strong Experience working with Python, Unix Shell scripts.
- Experience in Creating, Scheduling and Debugging Spark jobs using Python.
- Excellent Understanding of Spark and its benefits in BigData Analytics
- Strong Experience in creating and debugging jobs in EMR cluster and running it successfully.
- Strong and Extensive experience in dealing with data files in AWS S3
- Experience working with Amazon RedShift database.
- In-depth knowledge working with both Parquet and Avro data files
- In-depth experience Sqooping wide range of data sizes from Oracle to Hadoop environment
- Strong Experience in Parsing both structured and unstructured data files using Data Frames in PySpark
- Strong Debugging skill using Cloudera Resource Manager
- Strong Performance Improvement techniques that helps Hadoop jobs runs faster.
- Good Knowledge on NOSQL databases like Cassandra and Hbase
- Unique Experience in building ETL scripts in different languages like PLSQL, Informatica, Hive, Pig and PySpark.
- Expert-level knowledge of Oracle PL/SQL programming in Oracle 10g/oracle 11g
- Expert-level knowledge in design and development of PL/SQL Packages, Procedures, Functions, Triggers, Views, Sequences, Indexes and other DB objects, SQL Performance Tuning.
- Design PL/SQL implementations, Optimize and troubleshooting existing PL/SQL packages
- Demonstrated experience using Oracle Collections, bulking techniques, partition utilization to increase performance.
- Very strong in data modeling techniques in normalized (OLTP) modeling
- Expertise in creating complex Informatica mappings and workflows.
- Expertise in Performance tuning Informatica workflows
- Provide metrics and project planning updates for the development effort in Agile Projects.
- Strong experience working in projects involving Agile Methodologies
- Strong knowledge and use of development, methodologies, standards and procedures.
- Strong leadership qualities with excellent written and verbal communications skills.
- Ability to multi-task and provide expertise for multiple development teams across concurrent project tasks.
- Good time management skills & Strong problem solving skills
- Successfully coordinated & delivered several projects for Confidential
- Exposure to all phases of software development life cycle (SDLC)
- Involved in integration of Modules, also in integration test and finalizing the unit test cases
- Excellent interpersonal skills and an innate ability to provide motivation, and open to new and Innovative ideas for best possible solution.
TECHNICAL SKILLS
Operating Systems: Sun Solaris 5.6, UNIX, Red hat LINUX 3, WINDOWS-NT, 95, 98, 2000, XP
Languages: C, C++, PL/SQL, Shell Scripting, HTML, XM, Java, Python, HQL, PIG
Databases: Oracle 7.3, 8, 8i, 9i, 10g, 11g, SQL Server CE, HBase, Cassandra
Tools: & Utilities: TOAD, SQL developer, SQL Navigator, Erwin, SQL* Plus, PL/SQL Editor, SQL* Loader, Informatica, Autosys, Airflow, Subversion, Git-Bucket, Jenkins
PROFESSIONAL EXPERIENCE
Confidential, Beaverton, OR
Sr. Data/ Hadoop Engineer
Responsibilities:
- Developed SQOOP scripts to migrate data from Oracle to Big data Environment
- Migrated the functionality of Informatica jobs to HQL scripts using HIVE
- Developed ETL jobs using PIG, HIVE and SPARK
- Extensively worked with Avro and Parquet files and converted the data from either format
- Parsed Semi Strcutured JSON data and converted to Parquet using Data Frames in PySpark.
- Created Python UDF that are used in Spark
- Created Hive DDL on Parquet and Avro data files residing in both HDFS and S3 bucket
- Created Airflow Scheduling scripts in Python
- Worked extensively Sqooping wide range of data sets
- Extensively worked in Sentry Enabled system which enforces data security
- Involved in file movements between HDFS and AWS S3
- Extensively worked with S3 bucket in AWS
- Created Oozie workflows for scheduling
- Created data partitions on large data sets in S3 and DDL on partitioned data.
- Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
- Self driven Multiple small projects with quality output
- Extensively used Stash Git-Bucket for Code Control
- Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager
- Monitor and Troubleshoot EMR job logs using Genie
- Provided mentorship to fellow Hadoop developers
- Provided Solutions to technical issues in Big data
- Explained the issues in laymen terms to help BSAs understand
- Worked simultaneously on multiple tasks.
Confidential, Beaverton, OR
Sr. Data Engineer
Responsibilities:
- Gathering requirements and system specifications from the business users.
- Developed PL/SQL Packages, Procedures, Functions, Triggers, Views, Indexes, Sequences and Synonyms.
- Developed complex Informatica workflows and mappings.
- Worked on Tuning Informatica mappings using Partition Techniqques
- Extensively involved in tuning slow performing queries, procedures and functions.
- Extensively worked in OLAP environment.
- Involves co-ordination between OLTP and OLAP systems and teams.
- Extensively used collections and collection types to improve the data upload performance
- Co-ordinate with QA Team regularly for test scenarios and functionality.
- Organized knowledge sharing sessions with PS team.
- Identified and created missing DB Links, Indexes, and analyzed tables which helped improve performance of poor running SQL queries.
- Involved in both logical and physical model design.
- Extensively worked with DBA Team for refreshing the pre-production databases.
- Created index organized tables
- Simultaneously worked on multiple applications.
- Involved in estimating the effort required for the database tasks
- Involved in fixing Production bugs which involves in and out of assigned projects
- Explained the issues in laymen terms to help understand the BSAs
- Executed Jobs in Unix Environment
- Involved in Hadoop technology learning and coding couple of Hadoop scripts.
- Involved in many dry run activities to make sure we have smooth production release
- Involved extensively in creating a release plan during the project Go-Live
Confidential, Beaverton, OR
Oracle Developer
Responsibilities:
- Gathering requirements and system specifications from the business users.
- Developed PL/SQL Packages, Procedures, Functions, Triggers, Views, Indexes, Sequences and Synonyms.
- Extensively involved in tuning slow performing queries, procedures and functions.
- Extensively used collections and collection types to improve the data upload performance into ATLAS.
- Involved in working with ETL team in loading data from Oracle10g into Teradata
- Co-ordinate with QA Team regularly for test scenarios and functionality.
- Organized knowledge sharing sessions with PS team.
- Identified and created missing DB Links, Indexes, and analyzed tables which helped improve performance of poor running SQL queries.
- Involved in both logical and physical model design.
- Extensively worked with DBA Team for refreshing the pre-production databases.
- Worked closely with JBOSS team in providing the data needs.
- Worked on APEX tool which is used to create and store Customer Store information.
- Created index organized tables
- Closely worked with SAP systems.
- Simultaneously worked on multiple applications.
- Involved in estimating the effort required for the database tasks
- Involved in fixing Production bugs which involves in and out of assigned projects
- Explained the issues in laymen terms to help understand the BSAs
- Executed Jobs in Unix Environment
- Involved in many dry run activities to make sure we have smooth production release
- Involved extensively in creating a release plan during the project Go-Live
- Coordinated with the DBA team to gather statspack for a time frame which gives us the database load and Database activities happening during that particular time frame.
Confidential, Beaverton, OR
Oracle Developer
Responsibilities:
- Gathering requirements and system specifications from the business users.
- Developed PL/SQL Packages, Procedures, Functions, Triggers, Views, Indexes, Sequences and Synonyms.
- Extensively involved in tuning slow performing queries, procedures and functions.
- Extensively used collections and collection types to improve the data upload performance into ATLAS.
- Involved in working with ETL team in loading data from Oracle10g into Teradata
- Co-ordinate with QA Team regularly for test scenarios and functionality.
- Organized knowledge sharing sessions with PS team.
- Identified and created missing DB Links, Indexes, and analyzed tables which helped improve performance of poor running SQL queries.
- Involved in both logical and physical model design.
- Extensively worked with DBA Team for refreshing the pre-production databases.
- Worked closely with JBOSS team in providing the data needs.
- Worked on APEX tool which is used to create and store Customer Store information.
- Created index organized tables
- Closely worked with SAP systems.
- Simultaneously worked on multiple applications.
- Involved in estimating the effort required for the database tasks
- Involved in fixing Production bugs which involves in and out of assigned projects
- Explained the issues in laymen terms to help understand the BSAs
- Executed Jobs in Unix Environment
- Involved in many dry run activities to make sure we have smooth production release
- Involved extensively in creating a release plan during the project Go-Live
- Coordinated with the DBA team to gather statspack for a time frame which gives us the database load and Database activities happening during that particular time frame.