- Professional Software developer with 4.5+ years of technical expertise in all phases of Software development cycle (SDLC), in various Industrial sectors expertizing in Big Data analyzing Frameworks and ETL Tools.
- 3+ years of industrial experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, Spark, Flume, Sqoop, Avro, Sqoop, AWS and Zookeeper.
- Extraordinary Understanding of Hadoop building and Hands on involvement with Hadoop segments such as YARN, Name Node, Data Node and HDFS Framework.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
- Developed Customized UDFs and UDAF’s in java to extend HIVE’s core functionality.
- Created Hive tables to store structured data into HDFS and processed it using HiveQL.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
- Experience in using D-Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
- Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
- Extensive knowledge of AWS cloud infrastructure - RDS, Redshift, DynamoDB, EC2, EMR, Route53, CloudWatch, Lambda and IAM.
- Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Having strong Testing and Debugging skills with exposure to complete software development life cycle from requirements gathering to product release.
- Good Experience in creating Business Intelligence solutions and designing ETL workflows using Tableau.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
Big Data Ecosystem: HDFS, Map Reduce, Spark, YARN, Hive, Pig, Sqoop, Flume
Hadoop Distributions: Cloudera (CDH5), Hortonworks, Apache
Languages: Python, SQL and Java
No SQL Databases: MongoDB and Amazon DynamoDB
Cloud Computing Tools: Amazon AWS
DB Languages: PL/SQL
RDBMS: Oracle 11c, MySQL, Teradata
Development Tools: Microsoft SQL Studio, Toad, Eclipse
Development methodologies: Agile/Scrum
Visualization and analytics tool: Tableau Software
Operating systems: UNIX, Red Hat LINUX, Mac OS and Windows Variants
ETL Tools: Informatica
Confidential, Sanjose, CA
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Transformations and other during ingestion process.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Created lambda functions that are triggered by SNS and CloudWatch events to run daily jobs as scheduled.
- Worked on Spark in building BI reports using Tableau. Tableau was integrated with Spark using Spark-SQL.
- Monitoring and Debugging Spark jobs which are running on a spark cluster using Cloudera Manager.
- Worked with different File Formats like textfile, Parquet, ORC for HIVE querying and processing based on business logic.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs).
Environment: Scala, Python, Hadoop, Apache Spark, MapReduce, Amazon Web Services, CDH 5.9, Cloudera Manager, Control M Scheduler, Shell Scripting, Agile Methodology, JIRA, Git, Tableau.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and preprocessing.
- Involved in creating Hive tables, writing complex Hive queries to populate Hive tables.
- Load and transform large sets of structured, semi structured and unstructured data.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Optimized Hive QL Scripts by using execution engine like Tez.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
- Used different file formats like Text files, Avro, Parquet and ORC.
- Worked with different File Formats like textfile, Parquet for HIVE querying and processing based on business logic.
- Used JIRA for creating the user stories and creating branches in the bitbucket repositories based on the story.
- Knowledge on creating various repositories and version control using GIT.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Python, Hadoop, MapReduce, CDH 5.9, Cloudera Manager, Control M Scheduler, Shell Scripting, Agile Methodology, JIRA, Git, Tableau.
Informatica ETL Developer
- Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union to develop robust mappings in the Informatica Designer.
- Created Informatica maps using various transformations like Source Qualifier, Expression, Lookup, Stored procedure, Aggregate, Update Strategy, Joiner, Filter and Router.
- Performed coding, testing and implementation of Informatica mappings and workflows.
- Mentoring the team members on support activities.
- Migrated ETL codes from Development to Test to Production.
- Problem resolution of high severity batch and online problems in the production system.
Environment: Informatica Power Center 8.x/ 7.x, Oracle/DB2, UNIX, Control M