Hadoop Developer Resume
San Jose, CA
SUMMARY:
- Overall 7+ years of experience in all phases of Software Application requirement analysis, design, development and maintenance of Hadoop/Big Data application like SPARK, KAFKA, DyanamoDB, SQS, S3, EMR, Hive, Sqoop, redis and applications using java and scala to tailored with industry needs.
- Hands on experience with Spark Core, Spark SQL, Spark Streaming.
- Used Spark - SQL to perform transformations and actions on data residing in Hive.
- Used Kafka & Spark Streaming for real-time processing.
- Ability to spin up different AWS VPC like EC2, EBS, S3, SQS, SNS, Lambda, Redis, EMR using cloud formation templates.
- Hands on experience with Amazon DynamoDB integrating with Spark.
- Ensure data integrity and data security on AWS technology by implementing AWS best practices.
- Templated AWS infrastructure in code with Terraforms to build out staging and production environments.
- Deployed instances in AWS EC2 and used EBS stores for persistent storage and also performed access management using IAM service.
- Having 3+ years of hands on experience with Big Data Ecosystems including Hadoop and YARN, Spark, Kafka, DynamoDB, Redshift, SQS, SNS, Hive, Sqoop, Flume, Pig, Oozie, MapReduce, Zookeeper in a range of industries such as Financing sector and Health care.
- Good Knowledge in Teradata.
- Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
- Hands on experience in coding Map Reduce/Yarn Programs using Java, Scala for analyzing Big data.
- GoodKnowledgeinApacheSparkdataprocessingtohandledatafromRDBMSandstreamingsourceswith Spark streaming.
- ExperienceinDataWarehousingandETLprocessesandStrongdatabase,SQL,ETLanddataanalysis skills.
- Experience in developing Hive QLscripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDF's) for data specific processing.
- Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Extensive experience in using Flume to transfer log data files to Hadoop Distributed File System.
- Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode and MapReduce programming paradigm.
- Tested various flume agents, data ingestion into HDFS, retrieving and validating snappy files.
- HavegoodskillsinwritingSPARKJobsinScalaforprocessinglargesetsofstructured,semi-structuredand store them in HDFS.
- Good Knowledge in Spark SQL queries to load tables into HDFS to run select queries on top.
- ExperienceinNoSQLColumn-OrientedDatabaseslikeCassandra,HBase,MongoDBanditsIntegration with Hadoop cluster.
- Experience in writing Hive Queries for processing and analyzing large volumes of data.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Developed Oozie workflows by integrating all tasks relating to a project and schedule the jobs as per requirements.
- Automated all the jobs, for pulling data from upstream server to load data into Hive tables, using Oozie workflows.
- Experience with various scripting languages like Linux/Unix shell scripts.
- Implemented several optimization mechanisms like Combiners, Distributed Cache, Data Compression, and Custom Practitioner to speed up the jobs.
TECHNICAL SKILLS:
Big Data/Hadoop: HDFS, Hive, Pig, Sqoop, Oozie, Flume, ImpalaZookeeper, Kafka, Map Reduce, Cloudera, Amazon EMR.
Spark Components: Spark Core, SparkSQL, Spark Streaming.
Programming Languages: SQL, Scala, Java, Python and Unix Shell Scripting.
Databases & NoSQL: MySQL, Pig Latin, Hive-QL, Terradata, RDBMS.
Cloud: Amazon EMR, EC2, EBS, S3, SQS, SNS, Lambda, Redshift.
Operating Systems: Windows, Unix, Red Hat Linux.
PROFESSIONAL EXPERIENCE:
Confidential - San Jose,CA
Hadoop Developer
- Interacting with multiple teams understanding their business requirements for designing flexible and common component.
- Validating the source file for Data Integrity and Data Quality by reading header and trailer information and column validations.
- Implemented Spark SQL to access hive tables into spark for faster processing of data.
- Used Hive to do transformations, joins, filter and some pre-aggregations before storing the data.
- Data visualization for some data set by pyspark in jupyter notebook.
- Validating and visualizing the data in Tableau.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Working with Offshore and onsite teams for Sync up.
- Developed POC using Scala, Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
- Using hive extensively to create a views for the feature data.
- Creating and maintaining automation jobs for different Data sets.
- Working with platform and Hadoop teams closely for the needs of the team.
- Using Kafka for Data ingestion for different data sets.
- Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
- Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
- Created and Validate the Hive views using HUE
- Created deployment document and user manual to do validations for the dataset.
- Created Data Dictionary for Universal data sets.
- Created shell scripts and TES job scheduler for workflow execution to automate the loading into different data sets.
- Developed sqoop jobs to import the data from RDBMS and file servers into Hadoop.
- Created Heat maps, Bar Chats, plot graphs using pyplot,numpy,pandas,numpy,scipy in Jupyter Notebook.
- Working closely with EDNA and Platform teams for the team needs and platform issues.
Confidential - Richmond, VA
Spark/Hadoop Developer
Responsibilities:
- Interacting with multiple teams understanding their business requirements for designing flexible and common component.
- Validating the source file for Data Integrity and Data Quality by reading header and trailer information and column validations.
- Used Spark SQL for creating data frames and performed transformations on data frames like adding schema manually, casting, joining data frames before storing them.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Implemented Spark SQL to access hive tables into spark for faster processing of data.
- Worked on Spark streaming using Apache Kafka for real time data processing.
- Experience in creating Kafka producer and Kafka consumer for Spark streaming.
- Used Hive to do transformations, joins, filter and some pre-aggregations before storing the data onto HDFS.
- Installed and configured Hadoop, YARN, MapReduce, Flume, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in Python for data cleaning.
- Used Sqoop for importing and exporting data from Netezza, Teradata into HDFS and Hive.
- Worked on three layers for storing data such as raw layer, intermediate layer and publish layer.
- Creating external hive tables to store and queries the data which is loaded.
- Optimizations techniques include partitioning, bucketing.
- Using Avro file format compressed with Snappy in intermediate tables for faster processing of data.
- Used parquet file format for published tables and created views on the tables.
- Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
- Automated the jobs with Oozie and scheduled them with Autosys.
- Experience in AWS to spin up the EMR cluster to process the huge data which is stored in S3 and push it to HDFS.
- Participated in evaluation and selection of new technologies to support system efficiency.
- Participated in development and execution of system and disaster recovery processes.
Environment: Hadoop, Cloudera, Amazon AWS, HDFS, Hive, Impala, Spark, Autosys, Kafka, DynamoDB, Lambda, s3, SQS, SNS, Sqoop, Pig, Java, Scala, Eclipse, Tableau, Teradata, UNIX, and Maven, SBT.
Confidential, Albany, NY
Hadoop Developer
Responsibilities:
- Importing and exporting data into HDFS using Sqoop which included incremental loading.
- Developed Hive queries for processing and data manipulation.
- Worked on optimizing and tuning Hive to achieve optimal performance.
- Experienced in defining job flows managing and using oozie.
- Responsible to manage data coming from different sources.
- Supported Hive Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Hands on Experience in Oozie Job Scheduling.
- Involved in creating Hive tables, loading data and writing Hive queries.
- Working closely with AWS to migrate the entire Data Centers to the cloud using VPC, EC2, S3, EMR, RDS, Splice Machine and DynamoDB services.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java, Scala, Hortonworks, Amazon EMR, EC2, S3.
Confidential, Santa Monica, CA
Hadoop Developer
Responsibilities:
- Developed MapReduce jobs using Java API.
- Wrote MapReduce jobs using Pig Latin.
- Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
- Worked on Cluster coordination services through Zookeeper.
- Worked on loading log data directly into HDFS using Flume.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources BIDW & Analytics knowledge.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
- Implemented JMS for asynchronous auditing purposes.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
- Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Experience in Document designs and procedures for building and managing Hadoop clusters.
- Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
- Experience in Automate deployment, management and self-serve troubleshooting applications.
- Define and evolve existing architecture to scale with growth data volume, users and usage.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
- Installed and configured Hive and also written Hive UDFs.
- Experience in managing the CVS and migrating into Subversion.
Environment: Hadoop, HDFS, Hive, Flume, Sqoop, PIG, Eclipse, MySQL and RedHat, Java (JDK 1.6).
Confidential
Java Developer
Responsibilities:
- Involved in design, development and analysis documents in sharing with Clients.
- Developed web pages using Struts framework, JSP, XML, JavaScript, Hibernate, springs, Html/ DHTML and CSS, configure struts application, use tag library.
- Developed Application using Spring and Hibernate, Spring batch, Web Services like Soap and restful Web services.
- Used Spring Framework at Business Tier and also spring’s Bean Factory for initializing services.
- Used AJAX, JavaScript to create interactive user interface.
- Implemented client side validations using JavaScript & server side validations.
- Developed Single Page application using angular JS & backbone JS.
- Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
- Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
- Database modeling, administration and development using SQL and PL/SQL in Oracle 11g.
- Coded different deployment descriptors using XML. Generated Jar files are deployed on Apache Tomcat Server.
- Involved in the development of presentation layer and GUI framework in JSP. Client-Side validations were done using JavaScript.
- Involved in configuring and deploying the application using WebSphere.
- Involved in code reviews and mentored the team in resolving issues.
- Undertook the Integration and testing of the various parts of the application.
- Developed automated Build files using ANT.
- Used Subversion for version control and log4j for logging errors.
- Code Walkthrough, Test cases and Test Plans
Environment: HTML5, JSP, Servlets, JDBC, JavaScript, Json, jQuery, Spring, SQL, Oracle 11g, Tomcat, Eclipse IDE, XML, XSL, ANT, Tomcat 5.
Confidential
Associate Java Developer
Responsibilities:
- Involved in the complete SDLC software development life cycle of the application from requirement gathering and analysis to testing and maintenance.
- Developed the modules based on MVC Architecture.
- Developed UI using JavaScript, JSP, HTML and CSS for interactive cross browser functionality and complex user interface.
- Created business logic using servlets and session beans and deployed them on ApacheTomcat server.
- Created complex SQL Queries, PL/SQL Stored procedures and functions for back end.
- Prepared the functional, design and test case specifications.
- Performed unit testing, system testing and integration testing.
- Developed unit test cases. Used JUnit for unit testing of the application.
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.
Environment: Java, JSP, Servlets, ApacheTomcat, Oracle, JUnit, SQL
