Hadoop Developer Resume Sanjose, CA - Hire IT People

PROFESSIONAL SUMMARY:

Professional Software developer with 4.5+ years of technical expertise in all phases of Software development cycle (SDLC), in various Industrial sectors expertizing in Big Data analyzing Frameworks and ETL Tools.
3+ years of industrial experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, Spark, Flume, Sqoop, Avro, Sqoop, AWS and Zookeeper.
Extraordinary Understanding of Hadoop building and Hands on involvement with Hadoop segments such as YARN, Name Node, Data Node and HDFS Framework.
Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
Developed Customized UDFs and UDAF’s in java to extend HIVE’s core functionality.
Created Hive tables to store structured data into HDFS and processed it using HiveQL.
Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
Experience in using D-Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
Extensive knowledge of AWS cloud infrastructure - RDS, Redshift, DynamoDB, EC2, EMR, Route53, CloudWatch, Lambda and IAM.
Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
Having strong Testing and Debugging skills with exposure to complete software development life cycle from requirements gathering to product release.
Good Experience in creating Business Intelligence solutions and designing ETL workflows using Tableau.
Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, Map Reduce, Spark, YARN, Hive, Pig, Sqoop, Flume

Hadoop Distributions: Cloudera (CDH5), Hortonworks, Apache

Languages: Python, SQL and Java

No SQL Databases: MongoDB and Amazon DynamoDB

Cloud Computing Tools: Amazon AWS

DB Languages: PL/SQL

RDBMS: Oracle 11c, MySQL, Teradata

Development Tools: Microsoft SQL Studio, Toad, Eclipse

Development methodologies: Agile/Scrum

Visualization and analytics tool: Tableau Software

Operating systems: UNIX, Red Hat LINUX, Mac OS and Windows Variants

ETL Tools: Informatica

PROFESSIONAL EXPERIENCE:

Confidential, Sanjose, CA

Hadoop Developer

Responsibilities:

Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Transformations and other during ingestion process.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
Created lambda functions that are triggered by SNS and CloudWatch events to run daily jobs as scheduled.
Worked on Spark in building BI reports using Tableau. Tableau was integrated with Spark using Spark-SQL.
Monitoring and Debugging Spark jobs which are running on a spark cluster using Cloudera Manager.
Worked with different File Formats like text file, Parquet, ORC for HIVE querying and processing based on business logic.
Worked extensively with HIVE DDLs and Hive Query language (HQLs).

Environment: Python, Hadoop, Apache Spark, MapReduce, Amazon Web Services, CDH 5.9, Cloudera Manager, Control M Scheduler, Shell Scripting, Agile Methodology, Kafka, JIRA, HBase, Git, Tableau

Confidential, San francisco, CA

Hadoop Developer

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and preprocessing.
Involved in creating Hive tables, writing complex Hive queries to populate Hive tables.
Load and transform large sets of structured, semi structured and unstructured data.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Optimized Hive QL Scripts by using execution engine like Tez.
Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
Used different file formats like Text files, Avro, Parquet and ORC.
Worked with different File Formats like text file, Parquet for HIVE querying and processing based on business logic.
Used JIRA for creating the user stories and creating branches in the bitbucket repositories based on the story.
Knowledge on creating various repositories and version control using GIT.
Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Python, Hadoop, MapReduce, CDH 5.9, Cloudera Manager, Control M Scheduler, Shell Scripting, Agile Methodology, JIRA, Git, Tableau.

Confidential, San francisco, CA

Informatica ETL Developer

Responsibilities:

Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union to develop robust mappings in the Informatica Designer.
Created Informatica maps using various transformations like Source Qualifier, Expression, Lookup, Stored procedure, Aggregate, Update Strategy, Joiner, Filter and Router.
Performed coding, testing and implementation of Informatica mappings and workflows.
Mentoring the team members on support activities.
Migrated ETL codes from Development to Test to Production.
Problem resolution of high severity batch and online problems in the production system.

Environment: Informatica Power Center 8.x/ 7.x, Oracle/DB2, UNIX, Control M

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Sanjose, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship