Sr. Data/ Hadoop Engineer Resume Seattle, WA - Hire IT People

PROFESSIONAL SUMMARY:

Over 9+ years of extensive experience in the field of information technology industry, providing strong business solutions with excellent technical, communication and customer service expertise.
Over 4 years of extensive experience in Hadoop, Spark and Big data Technologies
Strong Experience in using Hadoop eco - system components like HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Impala.
Strong Experience in AWS Cloud services like EC2 and S3.
Strong working experience in monitoring data using Splunk.
Strong experience in guiding and building CI/CD pipelines.
Strong coding experience in Python,Unix Shell and Power Shell
Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
Experience in Creating, Scheduling and Debugging Spark jobs using Python.
Excellent Understanding of Spark and its benefits in Big Data Analytics
Strong Experience in creating and debugging jobs in EMR cluster and running it successfully.
Strong and Extensive experience in dealing with data files in AWS S3.
Experience working with Amazon RedShift database.
In-depth knowledge working with both Parquet and Avro data files
In-depth experience Sqooping wide range of data sizes from Oracle to Hadoop environment.
Strong Experience in Parsing both structured and unstructured data files using Data Frames in PySpark
Strong Debugging skill using Cloudera Resource Manager
Strong Performance Improvement techniques that helps Hadoop jobs runs faster.
Strong Knowledge on NOSQL databases like Document DB, Cassandra and Hbase
Unique Experience in building ETL scripts in different languages like PLSQL, Informatica, Hive, Pig and PySpark.
Expert-level knowledge of Oracle PL/SQL programming in Oracle 10g/oracle 11g
Expert-level knowledge in design and development of PL/SQL Packages, Procedures, Functions, Triggers, Views, Sequences, Indexes and other DB objects, SQL Performance Tuning.
Design PL/SQL implementations, Optimize and troubleshooting existing PL/SQL packages
Demonstrated experience using Oracle Collections, bulking techniques, partition utilization to increase performance.
Very strong in data modeling techniques in normalized (OLTP) modeling
Expertise in Performance tuning Informatica mappings and workflows
Provide metrics and project planning updates for the development effort in AgileProjects.
Strong knowledge and use of development, methodologies, standards and procedures.
Strong leadership qualities with excellent written and verbal communications skills.
Ability to multi-task and provide expertise for multiple development teams across concurrent project tasks.
Good time management skills & Strong problem solving skills
Exposure to all phases of software development life cycle (SDLC)
Excellent interpersonal skills and an innate ability to provide motivation, and open to new and Innovative ideas for best possible solution.

TECHNICAL SKILLS:

OPERATING SYSTEMS LANGUAGES: Sun Solaris 5.6, UNIX, Red hat LINUX 3, WINDOWS-NT, 95, 98, 2000, XP C, C++, PL/SQL, Shell Scripting, HTML, XM, Java, Python, HQL, PIG,USQL, PowerShell

DATABASES: Oracle 7.3, 8, 8i, 9i, 10g, 11g, SQL Server CE, HBase, Cassandra Document DB

TOOLS: TOAD, SQL developer, SQL Navigator, Erwin, SQL* Plus, PL/SQL Editor, SQL* Loader, Informatica, Autosys, Airflow, Subversion, Git-Bucket, Jenkins, Visual Studio Enterprise 2015

Hadoop Distributions: Cloudera, Amazon Web Services, Horton Works, Azure

PROFESSIONAL EXPERIENCE:

Confidential, Seattle, WA

Sr. Data/ Hadoop Engineer

Roles & Responsibilities:

Developed SQOOP scripts to migrate data from Oracle to Big data Environment
Migrated the functionality of Informatica jobs to HQL scripts using HIVE
Developed ETL jobs using PIG, HIVE and SPARK
Extensively worked with Avro and Parquet files and converted the data from either format
Parsed Semi Structured JSON data and converted to Parquet using Data Frames in PySpark.
Created Python UDF that are used in Spark
Created Hive DDL on Parquet and Avro data files residing in both HDFS and S3 bucket
Created Airflow Scheduling scripts in Python
Worked extensively Sqooping wide range of data sets
Extensively worked in Sentry Enabled system which enforces data security
Involved in file movements between HDFS and AWS S3
Extensively worked with S3 bucket in AWS
Created Oozie workflows for scheduling
Created tables and views in RedShift Database
Imported data from S3 buckets to Redshift
Created data partitions on large data sets in S3 and DDL on partitioned data.
Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
Self driven Multiple small projects with quality output
Extensively used Stash Git-Bucket for Code Control
Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager
Monitor and Troubleshoot EMR job logs using Genie
Provided mentorship to fellow Hadoop developers
Provided Solutions to technical issues in Big data
Explained the issues in laymen terms to help BSAs understand
Worked simultaneously on multiple tasks.

Environment: SQOOP, ETL, PIG, HIVE,SPARK, Python,HDFS, AWS-S3, Airflow, RedShift, EMR, Bit-Bucket, YARN, Genie, Unix.

Confidential, Los Angeles, CA

Sr. Data Engineer

Roles & Responsibilities:

Gathering requirements and system specifications from the business users.
Developed PL/SQL Packages, Procedures, Functions, Triggers, Views, Indexes, Sequences and Synonyms.
Developed complex Informaticaworkflows and mappings.
Worked on Tuning Informatica mappings using Partition Techniques
Extensively involved in tuning slow performing queries, procedures and functions.
Extensively worked in OLAP environment.
Involves co-ordination between OLTP and OLAP systems and teams.
Extensively used collections and collection types to improve the data upload performance
Co-ordinate with QA Team regularly for test scenarios and functionality.
Organized knowledge sharing sessions with PS team.
Identified and created missing DB Links, Indexes, and analyzed tables which helped improve performance of poor running SQL queries.
Involved in both logical and physical model design.
Extensively worked with DBA Team for refreshing the pre-production databases.
Created index organized tables
Simultaneously worked on multiple applications.
Involved in estimating the effort required for the database tasks
Involved in fixing Production bugs which involves in and out of assigned projects
Explained the issues in laymen terms to help understand the BSAs
Executed Jobs in Unix Environment
Involved in Hadoop technology learning and coding couple of Hadoop scripts.
Hands on experience in development, installation, configuring, and using Hadoop & ecosystem components like Hadoop Map Reduce, HDFS, Hive, Sqoop, Pig, Flume, Kafka, Spark.
Involved in many dry run activities to make sure we have smooth production release
Involved extensively in creating a release plan during the project Go-Live

Environment: PL/SQL,SQL, Informatica, Oracle 11g, OLTP,OLAP, Unix.

Confidential, Dallas, TX

ETL Developer

Roles & Responsibilities:

Involved in all phases of software development life cycle.
Interacted with data architects and business users to develop data profiles and analyzed source data.
Involved in requirements gathering and analysis to define functional specifications.
Created various logical and physical data models interacting with the business team, leads and developers using ERWIN.
Interacting with Report Users and Business Analysts to understand and document the Requirements and translate these to Technical Specifications for Designing the Informatica Mappings.
Involved in extracting source data from Oracle, SQL Server 2000, Flat Files on different systems
Extensively used Informatica Client tools - Source Analyzer, Warehouse Designer, Mapping Designer, Mapplet Designer, Informatica Repository Manager and Informatica Workflow Manager.
Developed number of Informatica mappings based on business requirements using various transformations like Dynamic Lookup, Connected and Unconnected lookups, Filter, Stored procedure, Update Strategy, Joiner, Aggregator, Expression, Router, Sequence generator and Normalizer.
Created Connected, Unconnected and Dynamic lookup transformation for better performance and increased the cache file size based on the size of the lookup data.
Extensively tested the mappings by running the queries against Source and Target tables and by using Break points in the Debugger.
Used Informatica’s features to implement Type 1,2 changes in slowly changing dimension tables.
Used version control to check in and checkout versions of objects.
Used Informatica Designer to create reusable transformations to be used in Informatica mappings and mapplets.
Moved the Informatica objects to global repository after finalizing the design in the local repository.
Created and scheduled Sessions, Jobs based on demand, run on time and run only once using Workflow Manager.
Worked in fixing poorly designed mappings, workflows, worklets, sessions, and target data loads for better performance.
Extensively worked in the performance tuning sources, targets, mappings and SQL queries in the transformations.
Written PL/SQL Stored Procedures, Functions and Packages and Triggers to implement business rules into the application.
Responsible for SQL tuning and optimization using Explain Plan, TKPROF utility and optimizer hints.
Wrote UNIX shell scripts to work with flat files, to define parameter files and to create pre and post session commands.
Used Pmcmd commands in non-windows environment.
Worked as a part of a team and provided 7 x 24 supports when required.
Participated in Enhancements meeting to distinguish between bugs and enhancements.

Environment: Informatica Power Center 8.5, Erwin, Oracle 10g, SQL, PL/SQL, TOAD, SQL, Flat Files, UNIX Shell Scripting, Sun Solaris 5.8 and Windows 2000.

Confidential, Dallas, TX

Oracle Developer

Roles & Responsibilities:

Gathering requirements and system specifications from the business users.
Developed PL/SQL Packages, Procedures, Functions, Triggers, Views, Indexes, Sequences and Synonyms.
Extensively involved in tuning slow performing queries, procedures and functions.
Extensively used collections and collection types to improve the data upload performance.
Involved in working with ETL team in loading data from Oracle10g into Teradata
Co-ordinate with QA Team regularly for test scenarios and functionality.
Organized knowledge sharing sessions with PS team.
Identified and created missing DB Links, Indexes, and analyzed tables which helped improve performance of poor running SQL queries.
Involved in both logical and physical model design.
Extensively worked with DBA Team for refreshing the pre-production databases.
Worked closely with JBOSS team in providing the data needs.
Worked on APEX tool which is used to create and store Customer Store information.
Created index organized tables
Closely worked with SAP systems.
Simultaneously worked on multiple applications.
Involved in estimating the effort required for the database tasks
Involved in fixing Production bugs which involves in and out of assigned projects
Explained the issues in laymen terms to help understand the BSAs
Executed Jobs in Unix Environment
Involved in many dry run activities to make sure we have smooth production release
Involved extensively in creating a release plan during the project Go-Live
Coordinated with the DBA team to gather statspack for a time frame which gives us the database load and Database activities happening during that particular time frame.

Environment: PL/SQL,SQL, ETL, Oracle10g, JBOSS, APEX, SAP, Unix.

Confidential

Oracle Developer

Roles & Responsibilities:

Worked on designing the content and delivering the solutions based on understanding the requirements.
Wrote web service client for tracking operations for the orders which is accessing web services API and utilizing in our web application.
Developed User Interface using JavaScript, JQuery and HTML.
Used AJAXAPI for intensive user operations and client-side validations.
Worked with Java, J2EE, SQL, JDBC, XML, JavaScript, web servers.
Utilized Servlet for the controller layer, JSP and JSP tags for the interface
Worked on Model View Controller Pattern and various design patterns.
Worked with designers, architects, developers for translating data requirements into the physical schema definitions for SQL sub-programs and modified the existing SQL program units.
Designed and Developed SQL functions and stored procedures.
Involved in debugging and bug fixing of application modules.
Efficiently dealt with exceptions and flow control.
Worked on Object Oriented Programming concepts.
Added Log4j to log the errors.
Used Eclipse for writing code and SVN for version control.
Installed and used MS SQL Server 2008 database.
Spearheaded coding for site management which included change of requests for enhancing and fixing bugs pertaining to all parts of the website.

Environment: Java, JDK1.8, Apache Tomcat-7, JavaScript, JSP, JDBC, Servlets, MS SQL, XML, Windows XP, Ant, SQL Server database, Red Hat Linux, Eclipse luna, SVN.

We provide IT Staff Augmentation Services!

Sr. Data/ Hadoop Engineer Resume

Seattle, WA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship