Senior Hadoop Developer Resume
San Jose, CA
SUMMARY:
- Over 8 years of experience in all phases of Software Development Life Cycle that includes Requirements Gathering/Analysis, Design, Development, Integration, Documentation, Testing, Build, Deployment, of Web and Enterprise applications and Implementation of Big data solutions using Hadoop.
- 4 years of experience in building solutions for Big data problems using HDFS, Map Reduce, PIG, Hive, Sqoop, Zoo keeper, Flume, Oozie.
- Experience in using various Hadoop components such as Map Reduce, Pig, Hive, Zookeeper, HBase, Sqoop, Oozie and Flume, Storm for data storage and streaming analysis.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Oozie, Tez, Hive, Sqoop, Pig, Zookeeper and Flume.
- Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice - versa.
- Proficient in HiveQL, PIG and SQL scripting and Query optimizations.
- Experience in using Kafka as a distributed publisher-subscriber messaging system.
- Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Flume and Hive
- Development experience with Big Data/NoSQL platforms, such as MongoDB and Cassandra.
- Worked and migrated RDMBS databases into different NoSQL database.
- Have a hand on experience on Data Warehousing experience on Extraction, Transformation and Loading (ETL) processes using Talend Open Studio for Data Integration.
- Hands on experience in performing data cleaning, pre processing using Java and Talend data preparation tool.
- Good Knowledge over job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Worked on implementing enterprise applications built on top of search engines like Solr and Elasticsearch.
- Excellent Programming skills at a higher level of abstraction using Scala and Spark.
- Good working experience in PySpark and Spark Sql.
- Experience with databases like DB2, Oracle 8g, MySQL and SQL Server.
- Proficient in using various IDEs like Eclipse and NetBeans.
- Expertise in design and development of Web Applications involving J2EE technologies with Java, Spring, EJB, AJAX, Hibernate, JSP, Struts, PL/SQL, Web Services, XML, JMS and JDBC.
- Familiar with data architecture including data ingestion pipeline design, data modeling and data mining and advanced data processing.
- Extensive experience in solving analytical problems using quantitative approaches using machine learning methods in R.
- Excellent problem solving skills and the ability to rapidly absorb new skills and adapt to new organizational contexts.
TECHNICAL SKILLS:
Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Tez Hive, Pig,Sqoop, Oozie, Flume, Cassandra, Spark, Kafka, Apache Mahout, Solr
Programming Languages: Core Java, Python, Scala, SQL, PL/SQL, HiveQL, XML, R, C++.
Databases: SQL Server, Oracle, DB2, SQL, MongoDB, Teradata.
Tools: Eclipse, NetBeans, Tableau, Rational Rose, QMF, Talend, Endevor, Toad
Operating Systems: Unix, Linux, Windows, MVS, OS/390, Z/OS
Methodologies: Agile, Waterfall
J2EE Technologies: Spring, Struts, Hibernate, JMS, JNDI, Web Services, Servlet 2.0 and JAXB
Scripting: Spring, Struts, Hibernate, JMS, JNDI, Web Services, Servlet 2.0 and JAXB
PROFESSIONAL EXPERIENCE:
Confidential, San Jose, CA
Senior Hadoop Developer
Responsibilities:
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinating jobs in Oozie.
- Handled complex Hive queries and UDFs.
- Involved in reading multiple data formats on HDFS using PySpark.
- Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs and Python.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark.
- Involved in loading data from UNIX file system to HDFS
- Involved in extracting the data from Teradata into HDFS using Sqoop
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Responsible in analysis, design, testing phases and responsible for documenting technical specifications
- Worked on Talend Administrator Console (TAC) for scheduling jobs and data integration.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Involved in using HCATALOG to access Hive table metadata from Pig code.
- Good knowledge in partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experienced in developing regression models on R for the statistical analysis.
- Involving in moving of data from other databases to Cassandra with basic knowledge of Cassandra Data Modeling.
- Worked on the core and Spark SQL modules of Spark extensively.
- Expertise in running Hadoop streaming jobs to process terabytes data.
- Experience in importing the real time data to Hadoop using Kafka and implemented the Oozie job.
Confidential, Bellevue, WA
Hadoop Developer
Responsibilities:
- Gathering the requirements from client, coordinating with Onsite, Offshore and Client teams.
- Experience in Hortonworks Distribution Platform 2.2, MapReduce, PIG, Hive, Sqoop, Control-M, HBase and Strom.
- Worked with large data sets in a pretty large cluster.
- Great knowledge on data mining and data warehousing.
- Worked with RabbitMQ with regards to messaging system.
- Worked on data preparation and data processing which needs to be loaded into HBase.
- Experienced on loading the data into Hive, and retrieving the data from Hive tables using HiveQL.
- Worked on loading the raw data extracts into Hive tables.
- Worked on creating external and managed tables in Hive.
- Designed HBase Schema, created HBase tables and loaded the historical data into HBase tables.
- Worked on loading data into HBase tables using HBase Put method and HBase Bulkloading methods.
- Daily updated the HBase tables using Oozie.
- Worked on HBase and Hive integration and loaded the data into HBase tables.
- Worked on building dashboards for visualizing it to higher level of hierarchy using Tableau.
- Worked on project related documentation in Confluence.
- Experience in offshore and onsite coordination.
Confidential, Boston, MA
Hadoop Developer
Responsibilities:
- Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data
- Importing and exporting data into HDFS from database and vice versa using Sqoop
- Written hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Involved in creating hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability
- Involved in developing shell scripts and automated data management from end to end integration work
- Used Pig as a ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS
- Developed Map Reduce program for parsing and loading into HDFS information.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
- Using Hbase to store majority of data which needs to be divided based on region.
- Developed Map Reduce programs for data analysis and data cleaning
Confidential
Java/JEE Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle.
- Interacting with all the modules of the project, gathered the batch related requirements and designed accordingly.
- Used Eclipse as IDE for application development.
- Created and maintained the configuration of the Spring Application Framework.
- Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
- Designed and developed GUI using JSP, HTML, DHTML and CSS.
- Worked with JMS for messaging interface.
- Developed UI using JAVA and used Oracle 10g as backend support through TOAD.
- Extensively used log4j for logging the log files.
- Used Subversion as the version control system.
- Responsible for understanding the scope of the project and requirement gathering.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for JUnit Testing.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Used CVS as configuration management tool for code versioning and release.
- Developed application using Eclipse and used build and deploy tool as Maven.
- Used Log4J to print the logging, debugging, warning, info on the server console.
- Performed unit testing using JUnit.
- Involved in scheduling all the batch tasks to run in different environments.
- Used JMS to send and receive messages in the form of XML’s.
- Configured the Data source to access the Oracle database using JDBC Provider for Oracle in the Application server.
- Involved in the maintenance and production support.
