Senior Cloud /data Engineer Resume
SUMMARY
- A Google certified Associate Cloud Engineer and strong skill big data developer with 12+ years of IT experience in Designing, Implementing, and Supporting the various Cloud, Big Data and Data warehouse applications.
- 8+ years of Hadoop experience in design and development of Big Data applications including experience in developing Spark/Scala jobs for processing large volumes of data.
- Implemented Spark jobs using Scala for processing the large volumes of data for daily batch jobs.
- More than a year of experience in implementing the cloud solutions using the Big Query, Composer, Airflow, Cloud Sql, Cloud Storage, Cloud Functions and Stack driver.
- Rich experience in Apache Hadoop MapReduce, Yarn, Spark, Pig, Sqoop, Hue, Flume, Kafka and Oozie.
- Experience in handling huge volumes of streaming messages from Flume/Kafka.
- Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Yarn and Map Reduce framework.
- Adequate knowledge in Software Development Life Cycle (SDLC), OOPS, Agile Methodology and the Scrum process, Data warehouse Concepts and Database Management practices.
- Extensively worked on Extraction, Transformation and Loading data from various sources including RDBMS, Flat files and XML to data warehouse and data marts.
- Implemented the solutions to move on - prem data into a predefined GCP storage location.
- Data from the Data Frames will be written to a temporary table on Big Query
- Implemented the Fact and Dimensions tables using the Big query.
- Involved in designing and implementing the Spark framework codes.
- Created Spark-SQLs and Data Framesto read the parquet data and create and load the RSP tables in Impala using the Scala API.
- Created and updated the Pentaho jobs to Extract, Transform and Load the data from Hive tables to impala table with Snappy and parquet formatting for BA reporting.
- Created web services to transfer the internal claims to Guidewire Policy Centre.
- Good interpersonal skills with the capability of handling multiple tasks and priorities. Self-motivated with high ability to understand and apply new concepts quickly.
- Ability to work independently with minimal supervision and to manage multiple projects in a fast-paced environment to meet deadlines, and an excellent team player.
TECHNICAL SKILLS
Insurance LOB: Personal Auto, Personal Property, Commercial Auto, Commercial Property.
ETL Tools: Informatica Power Centre, Pentaho 7.1
Languages: Shell Scripting, SQL, Java, Scala and Python.
Big Data Technologies: MapReduce, Hive, Impala, Spark, Sqoop, Kafka and Flume.
Cloud Technologies: Big Query, Cloud Storage, Cloud Functions, Data Flow and Pub/sub.
RDBMS: Oracle 11g, DB2, MySQL.
OS: Windows, UNIX, Linux.
Version Control/BTS: CVS, Visual SourceSafe, tortoise SVN, JIRA, Bit Bucket, Source tree, GitHub, Confluence.
PROFESSIONAL EXPERIENCE
Confidential
Senior Cloud /Data Engineer
Responsibilities:
- Involved in user interactions, requirement analysis and design for the interfaces.
- Created the cloud functions to trigged on GCS bucket File Event(finalize) after uploading the file in the Cloud storage and stores file information in technical meta store.
- Involved in implementing the Data collection framework and Bath Ingestion frames works using the Python.
- Implemented the solutions to move on -prem data into a predefined GCP storage location.
- Data from the Data Frames will be written to a temporary table on Big Query
- Implemented the Fact and Dimensions tables using the Big query.
- Worked with Business users for creating the pricing Analytical model using Big Query.
Environment: Cloud Storage, Confluent Kafka, Cloud functions, Composer, Airflow, Big Query, Python and GIT Hub.
Confidential
Big Data/Spark Developer
Responsibilities:
- Prepared the High Level & Low-Level Design confluence pages.
- Performed code reviews and supporting the technical team on various activities.
- Involved in designing and implementing the Spark framework codes.
- Created Spark-SQLs and Data Framesto read the parquet data and create and load the RSP tables in Impala using the Scala API.
- ImplementedSparkjobs using Scala for processing the large volumes of data for daily batch jobs.
Environment: HDFS, Spark, Data frames, Scala, Impala, Jira, GIT Hub, Confluence and Unix.
Confidential
Big Data/Spark Developer
Responsibilities:
- Provided production support and monitoring the batch jobs in Control M.
- Created and updated the data correction jobs to fix the production data issues.
- Implemented solutions to improve performance for Impala/Hive queries.
- Supported the implementation and drive to stable state in production.
- Troubleshoot production data issues for financial reports and implemented the solutions to maintain healthy audit checks.
- Implemented solutions for Impala/Hive queries and Sqoop jobs to support the Cloudera upgrade.
Environment: HDFS, Hive, Spark, Impala, Control M, Maven, Jira, GIT Hub and ServiceNow
Confidential
Pentaho/ETL Developer
Responsibilities:
- Prepared the High Level & Low-Level Design documents. Performed code reviews and supporting the technical team on various activities.
- Designed, developed and executed Pentaho MapReduce jobs to parse the various input xmls into CSV files.
- Created and updated the Pentaho jobs to Extract, Transform and Load the data from Hive tables to impala table with Snappy and parquet formatting for BA reporting.
- Created web services to transfer the internal claims to Guidewire Policy Centre.
Environment: MapReduce, HDFS, Pentaho 7.1/8.3, Hive, Impala, Jira, GIT Hub, Confluence.
Confidential
Hadoop Developer
Responsibilities:
- Collected variety of large data across the business systems and applications and imported to Hive and HDFS using Data Ingestion tools Sqoop and Flume.
- Designed, developed and executed Java MapReduce programs to merge the small files and split big files as per HDFS block size.
- Designed, developed and executed Java MapReduce programs to parse the various input files into CSV files.
- Created and updated the mappings to Extract, Transform and Load the historical data into the Hive table.
- Implemented business logic by writing Pig UDFs, Hive Generic UDF's in Java and used various UDFs from Piggybanks and other sources.
- Participated in peer design and code reviews.
- Responsible for DD as Scrum Master by handling Sprint Road Map, Sprint plans, Daily Scrums, Sprint Demos, Sprint Retrospect and Sprint Executions.
Environment: Java, MapReduce, HDFS, Sqoop, Flume, Pig, Hive, XML, JSON, MySql, Linux and Oozie.
Confidential
ETL/Java Developer
Responsibilities:
- Implemented for ETL jobs using UNIX scheduler.
- Developed and executed mappings, workflows, worklets and sessions using the Informatica.
- Coordinate with the business group and QA team to identify the issues.
- Created and executed the unit test plans.
- Participated in system design and technical solutions.
- Implement the requirements and the enhancements including coding, testing and debugging.
- Fix the bugs reported by QA team and the business group.
Environment: Informatica, Windows XP, DB2, Sybase, Oracle 10g, Java, Eclipse, Swing, Oracle, XML, Windows, Quality Center.