Hadoop Developer Resume San Jose,CA - Hire IT People

SUMMARY:

Overall 7+ years of experience in all phases of Software Application requirement analysis, design, development and maintenance of Hadoop/Big Data application like SPARK, KAFKA, DyanamoDB, SQS, S3, EMR, Hive, Sqoop, redis and applications using java and scala to tailored with industry needs.
Hands on experience with Spark Core, Spark SQL, Spark Streaming.
Used Spark - SQL to perform transformations and actions on data residing in Hive.
Used Kafka & Spark Streaming for real-time processing.
Ability to spin up different AWS VPC like EC2, EBS, S3, SQS, SNS, Lambda, Redis, EMR using cloud formation templates.
Hands on experience with Amazon DynamoDB integrating with Spark.
Ensure data integrity and data security on AWS technology by implementing AWS best practices.
Templated AWS infrastructure in code with Terraforms to build out staging and production environments.
Deployed instances in AWS EC2 and used EBS stores for persistent storage and also performed access management using IAM service.
Having 3+ years of hands on experience with Big Data Ecosystems including Hadoop and YARN, Spark, Kafka, DynamoDB, Redshift, SQS, SNS, Hive, Sqoop, Flume, Pig, Oozie, MapReduce, Zookeeper in a range of industries such as Financing sector and Health care.
Good Knowledge in Teradata.
Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
Hands on experience in coding Map Reduce/Yarn Programs using Java, Scala for analyzing Big data.
GoodKnowledgeinApacheSparkdataprocessingtohandledatafromRDBMSandstreamingsourceswith Spark streaming.
ExperienceinDataWarehousingandETLprocessesandStrongdatabase,SQL,ETLanddataanalysis skills.
Experience in developing Hive QLscripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDF's) for data specific processing.
Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Extensive experience in using Flume to transfer log data files to Hadoop Distributed File System.
Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode and MapReduce programming paradigm.
Tested various flume agents, data ingestion into HDFS, retrieving and validating snappy files.
HavegoodskillsinwritingSPARKJobsinScalaforprocessinglargesetsofstructured,semi-structuredand store them in HDFS.
Good Knowledge in Spark SQL queries to load tables into HDFS to run select queries on top.
ExperienceinNoSQLColumn-OrientedDatabaseslikeCassandra,HBase,MongoDBanditsIntegration with Hadoop cluster.
Experience in writing Hive Queries for processing and analyzing large volumes of data.
Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
Developed Oozie workflows by integrating all tasks relating to a project and schedule the jobs as per requirements.
Automated all the jobs, for pulling data from upstream server to load data into Hive tables, using Oozie workflows.
Experience with various scripting languages like Linux/Unix shell scripts.
Implemented several optimization mechanisms like Combiners, Distributed Cache, Data Compression, and Custom Practitioner to speed up the jobs.

TECHNICAL SKILLS:

Big Data/Hadoop: HDFS, Hive, Pig, Sqoop, Oozie, Flume, ImpalaZookeeper, Kafka, Map Reduce, Cloudera, Amazon EMR.

Spark Components: Spark Core, SparkSQL, Spark Streaming.

Programming Languages: SQL, Scala, Java, Python and Unix Shell Scripting.

Databases & NoSQL: MySQL, Pig Latin, Hive-QL, Terradata, RDBMS.

Cloud: Amazon EMR, EC2, EBS, S3, SQS, SNS, Lambda, Redshift.

Operating Systems: Windows, Unix, Red Hat Linux.

PROFESSIONAL EXPERIENCE:

Confidential - San Jose,CA

Hadoop Developer

Interacting with multiple teams understanding their business requirements for designing flexible and common component.
Validating the source file for Data Integrity and Data Quality by reading header and trailer information and column validations.
Implemented Spark SQL to access hive tables into spark for faster processing of data.
Used Hive to do transformations, joins, filter and some pre-aggregations before storing the data.
Data visualization for some data set by pyspark in jupyter notebook.
Validating and visualizing the data in Tableau.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
Working with Offshore and onsite teams for Sync up.
Developed POC using Scala, Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
Using hive extensively to create a views for the feature data.
Creating and maintaining automation jobs for different Data sets.
Working with platform and Hadoop teams closely for the needs of the team.
Using Kafka for Data ingestion for different data sets.
Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
Created and Validate the Hive views using HUE
Created deployment document and user manual to do validations for the dataset.
Created Data Dictionary for Universal data sets.
Created shell scripts and TES job scheduler for workflow execution to automate the loading into different data sets.
Developed sqoop jobs to import the data from RDBMS and file servers into Hadoop.
Created Heat maps, Bar Chats, plot graphs using pyplot,numpy,pandas,numpy,scipy in Jupyter Notebook.
Working closely with EDNA and Platform teams for the team needs and platform issues.

Confidential - Richmond, VA

Spark/Hadoop Developer

Responsibilities:

Interacting with multiple teams understanding their business requirements for designing flexible and common component.
Validating the source file for Data Integrity and Data Quality by reading header and trailer information and column validations.
Used Spark SQL for creating data frames and performed transformations on data frames like adding schema manually, casting, joining data frames before storing them.
Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Implemented Spark SQL to access hive tables into spark for faster processing of data.
Worked on Spark streaming using Apache Kafka for real time data processing.
Experience in creating Kafka producer and Kafka consumer for Spark streaming.
Used Hive to do transformations, joins, filter and some pre-aggregations before storing the data onto HDFS.
Installed and configured Hadoop, YARN, MapReduce, Flume, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in Python for data cleaning.
Used Sqoop for importing and exporting data from Netezza, Teradata into HDFS and Hive.
Worked on three layers for storing data such as raw layer, intermediate layer and publish layer.
Creating external hive tables to store and queries the data which is loaded.
Optimizations techniques include partitioning, bucketing.
Using Avro file format compressed with Snappy in intermediate tables for faster processing of data.
Used parquet file format for published tables and created views on the tables.
Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
Automated the jobs with Oozie and scheduled them with Autosys.
Experience in AWS to spin up the EMR cluster to process the huge data which is stored in S3 and push it to HDFS.
Participated in evaluation and selection of new technologies to support system efficiency.
Participated in development and execution of system and disaster recovery processes.

Environment: Hadoop, Cloudera, Amazon AWS, HDFS, Hive, Impala, Spark, Autosys, Kafka, DynamoDB, Lambda, s3, SQS, SNS, Sqoop, Pig, Java, Scala, Eclipse, Tableau, Teradata, UNIX, and Maven, SBT.

Confidential, Albany, NY

Hadoop Developer

Responsibilities:

Importing and exporting data into HDFS using Sqoop which included incremental loading.
Developed Hive queries for processing and data manipulation.
Worked on optimizing and tuning Hive to achieve optimal performance.
Experienced in defining job flows managing and using oozie.
Responsible to manage data coming from different sources.
Supported Hive Programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
Installed and configured Hive and also written Hive UDFs.
Hands on Experience in Oozie Job Scheduling.
Involved in creating Hive tables, loading data and writing Hive queries.
Working closely with AWS to migrate the entire Data Centers to the cloud using VPC, EC2, S3, EMR, RDS, Splice Machine and DynamoDB services.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java, Scala, Hortonworks, Amazon EMR, EC2, S3.

Confidential, Santa Monica, CA

Hadoop Developer

Responsibilities:

Developed MapReduce jobs using Java API.
Wrote MapReduce jobs using Pig Latin.
Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
Worked on Cluster coordination services through Zookeeper.
Worked on loading log data directly into HDFS using Flume.
Involved in loading data from LINUX file system to HDFS.
Responsible for managing data from multiple sources BIDW & Analytics knowledge.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Responsible to manage data coming from different sources.
Assisted in exporting analyzed data to relational databases using Sqoop.
Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
Implemented JMS for asynchronous auditing purposes.
Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
Experience in Document designs and procedures for building and managing Hadoop clusters.
Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
Experience in Automate deployment, management and self-serve troubleshooting applications.
Define and evolve existing architecture to scale with growth data volume, users and usage.
Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
Installed and configured Hive and also written Hive UDFs.
Experience in managing the CVS and migrating into Subversion.

Environment: Hadoop, HDFS, Hive, Flume, Sqoop, PIG, Eclipse, MySQL and RedHat, Java (JDK 1.6).

Confidential

Java Developer

Responsibilities:

Involved in design, development and analysis documents in sharing with Clients.
Developed web pages using Struts framework, JSP, XML, JavaScript, Hibernate, springs, Html/ DHTML and CSS, configure struts application, use tag library.
Developed Application using Spring and Hibernate, Spring batch, Web Services like Soap and restful Web services.
Used Spring Framework at Business Tier and also spring’s Bean Factory for initializing services.
Used AJAX, JavaScript to create interactive user interface.
Implemented client side validations using JavaScript & server side validations.
Developed Single Page application using angular JS & backbone JS.
Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
Database modeling, administration and development using SQL and PL/SQL in Oracle 11g.
Coded different deployment descriptors using XML. Generated Jar files are deployed on Apache Tomcat Server.
Involved in the development of presentation layer and GUI framework in JSP. Client-Side validations were done using JavaScript.
Involved in configuring and deploying the application using WebSphere.
Involved in code reviews and mentored the team in resolving issues.
Undertook the Integration and testing of the various parts of the application.
Developed automated Build files using ANT.
Used Subversion for version control and log4j for logging errors.
Code Walkthrough, Test cases and Test Plans

Environment: HTML5, JSP, Servlets, JDBC, JavaScript, Json, jQuery, Spring, SQL, Oracle 11g, Tomcat, Eclipse IDE, XML, XSL, ANT, Tomcat 5.

Confidential

Associate Java Developer

Responsibilities:

Involved in the complete SDLC software development life cycle of the application from requirement gathering and analysis to testing and maintenance.
Developed the modules based on MVC Architecture.
Developed UI using JavaScript, JSP, HTML and CSS for interactive cross browser functionality and complex user interface.
Created business logic using servlets and session beans and deployed them on ApacheTomcat server.
Created complex SQL Queries, PL/SQL Stored procedures and functions for back end.
Prepared the functional, design and test case specifications.
Performed unit testing, system testing and integration testing.
Developed unit test cases. Used JUnit for unit testing of the application.
Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.

Environment: Java, JSP, Servlets, ApacheTomcat, Oracle, JUnit, SQL

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

San Jose, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship