Hadoop Developer Resume Pleasanton, CA - Hire IT People

SUMMARY

Around 8 years of experience in the area of Spark, Hadoop, BIG DATA, JAVA as an Hadoop Developer.
Excellent understanding of Hadoop architecture and underlying framework including storage management.
Experience in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, HBase, SQOOP, OOZIE and Flume, STORM for data storage and streaming analysis.
Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Java into PigLatin and HQL (HiveQL).
Hands on Experience in Core Java for MapReduce Concepts and HDFS.
Experience with Oozie Scheduler in setting up workflow jobs with Map/Reduce and Pig jobs.
Expertized in Implementing Spark using Scala and Spark SQL for faster testing and processing of data.
Knowledge of architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
Experience in managing Hadoop clusters and services using Cloudera Manager.
Experience in troubleshooting errors in HBase Shell/API, Pig, Hive and MapReduce.
Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
Collected logs data from various sources and integrated in to HDFS using Flume.
Assisted Deployment team in setting up Hadoop cluster and services.
Hands - on experience in setting up Apache Hadoop and Cloudera CDH clusters on Ubuntu, Fedora and Windows (Cygwin) environments.
Excellent understanding of Virtualization, with experience of setting up a POC multi-node virtual cluster by leveraging underlying Bridge Networking and NAT technologies.
Experience in loading data to HDFS from UNIX (Ubuntu, Fedora, Centos) file system.
Knowledge of Apache Kafka.
Experienced in designing, developing, documenting, and testing of ETL jobs and mappings in Server and Parallel jobs using Datastage (ETL)(8.1/8.7/8.5) to populate tables in Data Warehouse and Data marts.
Good in developing UNIX shell scripting.
Hands on experience using query tool Teradata SQL Assistant.
Good knowledge in ORACLE, Teradata, MySQL, RDBMS, DB2, Netezza.
Successfully completed POC on designing conceptual model with Spark for performance optimization.
Good knowledge of Java, JavaScript, HTML.
Good knowledge in Retail domain.
Good presentation, data analysis, coordinating and team skills.
TEMPEffectively worked in cross-functional and global environments to manage multiple tasks and assignments concurrently.
Good Working knowledge of Tableau.
Good working knowledge of CDH.
Knowledge of project life cycle (design, development, testing and implementation) of Client Server and Web applications.
Knowledge of Hardware, Software, Networking and external tools including but not limited to Excel, Access and experience in utilizing their functionality as and when required to enhance productivity and ensure accuracy.
Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.
Technology enthusiast, highly motivated and an avid blog reader, keeping track of latest advancements in hardware and software fields.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie and Flume.

Spark Streaming Technologies: Spark, Kafka, Storm

Scripting Languages: Cassandra, Python, Scala and Bash.

Programming Languages: Java, SQL, Java Scripting, HTML5, CSS3

Databases: Data warehouse, RDBMS, NoSQL, Oracle,Db2,Netezza

Tools: Eclipse, JDeveloper, MS Visual Studio, Microsoft Azure HDinsight, Microsoft Hadoop cluster, JIRA,Lotus Notes,Notepad++, Teradata SQL Assistance, putty, Rational ReqPro, Rational Rose, TSRM .

Methodologies: Agile, Waterfall, UML, Design Patterns.

Operating Systems: Unix/Linux/AIX

Machine Learning Skills (MLlib): Feature Extraction, Dimensionality Reduction, Model Evaluation, Clustering.

ETL Tools/ Reporting Tools: Confidential WebSphere Datastage and QualityStage 8.7 and 8.5, Informatica, Tableau.

PROFESSIONAL EXPERIENCE

Confidential, Pleasanton, CA

Hadoop Developer

Responsibilities:

Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
Involved in parsing json data into structured format and loading into HDFS/Hive using sparkstreaming.
Involved in teams to analyze the Anomaly detection and ratings of the data.
Created various forms of Hive tables on structure, semi structured and unstructured data sources. Also designed and Developed Hive managed/external tables using Struct, Maps and Arrays using various storage formats
Ingested large volume of RDBMS transactional data source into HDFS using Sqoop import & analyzed with hive.
Developed Complex HiveQL‘s using SerDe JSON.
Developed Hive Custom User Defined Functions (UDF) based on business requirements.
Developed Oozie workflow for scheduling the ETL process and Hive scripts
Received streams will be stored in memory using RDD’s.
Frequency of receiving batches is set as 60 seconds and can be configurable at any point of time using configurations file.
Developing scripts to perform business transformations on the data using Hive and PIG.
Developing UDFs in java for hive and pig.
Worked on reading multiple data formats on HDFS using Scala.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Analyzed the SQL scripts and designed the solution to implement using Scala
Data analysis through Pig, Map Reduce, Hive.
Partitioning and Bucketing the imported data using HiveQL.
Worked on Cassandra to store data into cassandra which can be accessed by Mobile data .
Design and develop Data Ingestion component.
Cluster coordination services through Zookeeper.
Import of data using Sqoop from Oracle to HDFS.
Import and export of data using Sqoop from or to HDFS and Relational DB Teradata.
Implement Flume, Spark, Spark Stream framework for real time data processing.
Used Hbase, Pig, Flume, Hive and Sqoop.

Environment: Java, ETL ( Confidential Datastage), Linux, Teradata, DB2, Shell Scripting, HDFS cluster, Hive, Pig, Sqoop, Oozie, MapReduce, Pentaho, Cassandra.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

Responsible for Preparing Technical Specs, analyzing Functional Specs of Datastage.
Working as a team member and tracking the development progress.
Development and maintenance of code.
End-to-end testing of Data warehouse.
Worked on a live 60 nodes Hadoop cluster.
Worked with highly unstructured and semi structured data of 90 TB in size.
Extracted the data from Teradata into HDFS using Sqoop.
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
Developed UDFs in Java as and when necessary to use in PIG and HIVE queries
Experience in using Sequence files, AVRO file formats.
Developed Oozie workflow for scheduling and orchestrating the ETL process

Environments: Java, ETL ( Confidential Datastage), Linux, DB2, Shell Scripting, HDFS cluster, Hive, Pig, Sqoop, Oozie, MapReduce, Pentaho.

Confidential, CA

Hadoop Developer

Responsibilities:

Analyzing the LLD (low level document).
Understanding the logic of program.
Prepare the sample code in Hadoop environment.
Development of new program.
Code review maintaining the client standard.
Involved in Impact Analysis & Design.
Involved in Unit Testing, Integration Testing, Production parallel Testing.
Prepare issue log to explain to Clients.

Environments: Java, ETL ( Confidential Datastage), Linux, DB2, Shell Scripting, HDFS cluster, Hive, Pig, Sqoop, Oozie, MapReduce, Pentaho.

Confidential, MN

ETL/Hadoop Developer

Responsibilities:

Involved in the PIG script development for corresponding mainframe JCL’s
Writing Scoop jobs for moving data between Relational Databases and HDFS
Involved in Enhancement of Jobs as per the given requirements.
Involved in unit testing, system integration testing for the Hadoop jobs.
Designing the jobs into Control-M auto controlled Tool.
Providing support for the migrated jobs.

Environment: Hadoop (PIG LATIN),Java, Putty, Control-M,Db2, Hive

Confidential, UK

Hadoop Developer

Responsibilities:

Key member to find the cloud service and find the way to use that environment.
Involved in Analyzing of data and requirement analysis.
Involved in discussions for Architecture design.
Worked on huge data around 1 TB of data.
Performed data extraction and transformation using PIG.
Created Data Warehouse using HIVE.
Worked on various requirement; generated result using HiveQl.
Generated reports using Big sheet.
Trained more resources by sharing Hadoop knowledge.
Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources.

Environment: Hadoop (PIG LATIN), Java, Putty, Control-M, Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, Bigsheets and Cloud.

Confidential, US

ETL/Hadoop Developer

Responsibilities:

Responsible for running of jobs of ETL Phase
Used the Data stage Designer to create processes for extracting, cleansing, transforms, integrating and loading data into data warehouse database.
Extensively used Datastage(ETL) IIS Console to deploy and run the web services as Datastage(ETL) jobs.
Extensively used Datastage(ETL) admin, to change the parameter
Extensively used db2top, topas to admin performance .
Worked on shell scripting to automate the process.
Extensively worked on root cause analysis for the defects on Production.

Environment: Datastage(ETL) 8.5,DB2, AIX,Netezaa,LotusNotes

Confidential, US

ETL Developer

Responsibilities:

Responsible for running of jobs of ETL Phase
Used the Data Stage(ETL) Designer to create processes for extracting, cleansing, transforms, integrating and loading data into data warehouse database.
Jobs are scheduled using Datastage(ETL) Director.
Involved in Writing the SQL/PLSQL query s for Extraction of Data for Business Purpose on Production.

Confidential

ETL Developer

Responsibilities:

Involved in understanding user requirements, design, development and documentation for ETL mappings.
Designed and developed mapping from varied Transformations like Source qualifier, Aggregator, Expression, Rank, Router, Filter, Update Strategy and Joiner.
Importing Source and Target tables from their respective databases.
Developed Informatica(ETL) Mappings, Mapplets and Transformations for migration of data from existing systems to the new system using Informatica Designer.
Optimizing performance tuning at mapping and Transformation level.
Involved in designing the mapplet and reusable transformation according to the business needs.
Involved in Performance tuning at various levels including Target, Source, and Mapping Performed SCD type1 and SCD type2 mappings.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Pleasanton, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship