We provide IT Staff Augmentation Services!

Hadoop Developer Resume

San Jose, CA

SUMMARY:

  • Overall 7+ years of experience in all phases of Software Application requirement analysis, design, development and maintenance of Hadoop/Big Data application like SPARK, KAFKA, DyanamoDB, SQS, S3, EMR, Hive, Sqoop, redis and applications using java and scala to tailored with industry needs.
  • Hands on experience with Spark Core, Spark SQL, Spark Streaming.
  • Used Spark - SQL to perform transformations and actions on data residing in Hive.
  • Used Kafka & Spark Streaming for real-time processing.
  • Ability to spin up different AWS VPC like EC2, EBS, S3, SQS, SNS, Lambda, Redis, EMR using cloud formation templates.
  • Hands on experience with Amazon DynamoDB integrating with Spark.
  • Ensure data integrity and data security on AWS technology by implementing AWS best practices.
  • Templated AWS infrastructure in code with Terraforms to build out staging and production environments.
  • Deployed instances in AWS EC2 and used EBS stores for persistent storage and also performed access management using IAM service.
  • Having 3+ years of hands on experience with Big Data Ecosystems including Hadoop and YARN, Spark, Kafka, DynamoDB, Redshift, SQS, SNS, Hive, Sqoop, Flume, Pig, Oozie, MapReduce, Zookeeper in a range of industries such as Financing sector and Health care.
  • Good Knowledge in Teradata.
  • Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
  • GoodKnowledgeinApacheSparkdataprocessingtohandledatafromRDBMSandstreamingsourceswith Spark streaming.
  • ExperienceinDataWarehousingandETLprocessesandStrongdatabase,SQL,ETLanddataanalysis skills.
  • ExperienceindevelopingHiveQLscriptsforDataAnalysisandETLpurposesandalsoextendedthedefault functionality by writing User Defined Functions (UDF's) for data specific processing.
  • Extensive experience in using Flume to transfer log data files to Hadoop Distributed File System.
  • Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode and MapReduce programming paradigm.
  • Tested various flume agents, data ingestion into HDFS, retrieving and validating snappy files.
  • HavegoodskillsinwritingSPARKJobsinScalaforprocessinglargesetsofstructured,semi-structuredand store them in HDFS.
  • Good Knowledge in Spark SQL queries to load tables into HDFS to run select queries on top.
  • ExperienceinNoSQLColumn-OrientedDatabaseslikeCassandra,HBase,MongoDBanditsIntegration with Hadoop cluster.
  • Experience in writing Hive Queries for processing and analyzing large volumes of data.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Developed Oozie workflows by integrating all tasks relating to a project and schedule the jobs as per requirements.
  • Automated all the jobs, for pulling data from upstream server to load data into Hive tables, using Oozie workflows.
  • Experience with various scripting languages like Linux/Unix shell scripts.
  • Implemented several optimization mechanisms like Combiners, Distributed Cache, Data Compression, and Custom Practitioner to speed up the jobs.

TECHNICAL SKILLS:

Big Data/Hadoop: HDFS, Hive, Pig, Sqoop, Oozie, Flume, ImpalaZookeeper, Kafka, Map Reduce, Cloudera, Amazon EMR.

Spark Components: Spark Core, SparkSQL, Spark Streaming.

Programming Languages: SQL, Scala, Java, Python and Unix Shell Scripting.

Databases & NoSQL: MySQL, Pig Latin, Hive-QL, Terradata, RDBMS.

Cloud: Amazon EMR, EC2, EBS, S3, SQS, SNS, Lambda, Redshift.

Operating Systems: Windows, Unix, Red Hat Linux.

PROFESSIONAL EXPERIENCE:

Confidential - San Jose,CA

Hadoop Developer

  • Interacting with multiple teams understanding their business requirements for designing flexible and common component.
  • Validating the source file for Data Integrity and Data Quality by reading header and trailer information and column validations.
  • Implemented Spark SQL to access hive tables into spark for faster processing of data.
  • Used Hive to do transformations, joins, filter and some pre-aggregations before storing the data.
  • Data visualization for some data set by pyspark in jupyter notebook.
  • Validating and visualizing the data in Tableau.
  • Working with Offshore and onsite teams for Sync up.
  • Using hive extensively to create a views for the feature data.
  • Creating and maintaining automation jobs for different Data sets.
  • Working with platform and Hadoop teams closely for the needs of the team.
  • Using Kafka for Data ingestion for different data sets.
  • Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
  • Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
  • Created and Validate the Hive views using HUE
  • Created deployment document and user manual to do validations for the dataset.
  • Created Data Dictionary for Universal data sets.
  • Created shell scripts and TES job scheduler for workflow execution to automate the loading into different data sets.
  • Developed sqoop jobs to import the data from RDBMS and file servers into Hadoop.
  • Created Heat maps, Bar Chats, plot graphs using pyplot,numpy,pandas,numpy,scipy in Jupyter Notebook.
  • Working closely with EDNA and Platform teams for the team needs and platform issues.

Confidential - Richmond, VA

Spark/Hadoop Developer

Responsibilities:

  • Interacting with multiple teams understanding their business requirements for designing flexible and common component.
  • Validating the source file for Data Integrity and Data Quality by reading header and trailer information and column validations.
  • Used Spark SQL for creating data frames and performed transformations on data frames like adding schema manually, casting, joining data frames before storing them.
  • Implemented Spark SQL to access hive tables into spark for faster processing of data.
  • Worked on Spark streaming using Apache Kafka for real time data processing.
  • Experience in creating Kafka producer and Kafka consumer for Spark streaming.
  • Used Hive to do transformations, joins, filter and some pre-aggregations before storing the data onto HDFS.
  • Used Sqoop for importing and exporting data from Netezza, Teradata into HDFS and Hive.
  • Worked on three layers for storing data such as raw layer, intermediate layer and publish layer.
  • Creating external hive tables to store and queries the data which is loaded.
  • Optimizations techniques include partitioning, bucketing.
  • Using Avro file format compressed with Snappy in intermediate tables for faster processing of data.
  • Used parquet file format for published tables and created views on the tables.
  • Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
  • Automated the jobs with Oozie and scheduled them with Autosys.
  • Experience in AWS to spin up the EMR cluster to process the huge data which is stored in S3 and push it to HDFS.
  • Participated in evaluation and selection of new technologies to support system efficiency.
  • Participated in development and execution of system and disaster recovery processes.

Environment: Hadoop, Cloudera, Amazon AWS, HDFS, Hive, Impala, Spark, Autosys, Kafka, DynamoDB, Lambda, s3, SQS, SNS, Sqoop, Pig, Java, Scala, Eclipse, Tableau, Teradata, UNIX, and Maven, SBT.

Confidential, Albany, NY

Hadoop Developer

Responsibilities:

  • Importing and exporting data into HDFS using Sqoop which included incremental loading.
  • Developed Hive queries for processing and data manipulation.
  • Worked on optimizing and tuning Hive to achieve optimal performance.
  • Experienced in defining job flows managing and using oozie.
  • Responsible to manage data coming from different sources.
  • Supported Hive Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Hands on Experience in Oozie Job Scheduling.
  • Involved in creating Hive tables, loading data and writing Hive queries.
  • Working closely with AWS to migrate the entire Data Centers to the cloud using VPC, EC2, S3, EMR, RDS, Splice Machine and DynamoDB services.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java, Scala, Hortonworks, Amazon EMR, EC2, S3.

Confidential, Santa Monica, CA

Hadoop Developer

Responsibilities:

  • Developed MapReduce jobs using Java API.
  • Wrote MapReduce jobs using Pig Latin.
  • Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
  • Worked on Cluster coordination services through Zookeeper.
  • Worked on loading log data directly into HDFS using Flume.
  • Involved in loading data from LINUX file system to HDFS.
  • Responsible for managing data from multiple sources BIDW & Analytics knowledge.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Responsible to manage data coming from different sources.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
  • Implemented JMS for asynchronous auditing purposes.
  • Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
  • Experience in Document designs and procedures for building and managing Hadoop clusters.
  • Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
  • Experience in Automate deployment, management and self-serve troubleshooting applications.
  • Define and evolve existing architecture to scale with growth data volume, users and usage.
  • Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
  • Installed and configured Hive and also written Hive UDFs.
  • Experience in managing the CVS and migrating into Subversion.

Environment: Hadoop, HDFS, Hive, Flume, Sqoop, PIG, Eclipse, MySQL and RedHat, Java (JDK 1.6).

Confidential

Java Developer

Responsibilities:

  • Involved in design, development and analysis documents in sharing with Clients.
  • Developed web pages using Struts framework, JSP, XML, JavaScript, Hibernate, springs, Html/ DHTML and CSS, configure struts application, use tag library.
  • Developed Application using Spring and Hibernate, Spring batch, Web Services like Soap and restful Web services.
  • Used Spring Framework at Business Tier and also spring’s Bean Factory for initializing services.
  • Used AJAX, JavaScript to create interactive user interface.
  • Implemented client side validations using JavaScript & server side validations.
  • Developed Single Page application using angular JS & backbone JS.
  • Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
  • Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
  • Database modeling, administration and development using SQL and PL/SQL in Oracle 11g.
  • Coded different deployment descriptors using XML. Generated Jar files are deployed on Apache Tomcat Server.
  • Involved in the development of presentation layer and GUI framework in JSP. Client-Side validations were done using JavaScript.
  • Involved in configuring and deploying the application using WebSphere.
  • Involved in code reviews and mentored the team in resolving issues.
  • Undertook the Integration and testing of the various parts of the application.
  • Developed automated Build files using ANT.
  • Used Subversion for version control and log4j for logging errors.
  • Code Walkthrough, Test cases and Test Plans

Environment: HTML5, JSP, Servlets, JDBC, JavaScript, Json, jQuery, Spring, SQL, Oracle 11g, Tomcat, Eclipse IDE, XML, XSL, ANT, Tomcat 5.

Confidential

Associate Java Developer

Responsibilities:

  • Involved in the complete SDLC software development life cycle of the application from requirement gathering and analysis to testing and maintenance.
  • Developed the modules based on MVC Architecture.
  • Developed UI using JavaScript, JSP, HTML and CSS for interactive cross browser functionality and complex user interface.
  • Created business logic using servlets and session beans and deployed them on ApacheTomcat server.
  • Created complex SQL Queries, PL/SQL Stored procedures and functions for back end.
  • Prepared the functional, design and test case specifications.
  • Performed unit testing, system testing and integration testing.
  • Developed unit test cases. Used JUnit for unit testing of the application.
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.

Environment: Java, JSP, Servlets, ApacheTomcat, Oracle, JUnit, SQL

Hire Now