Hadoop/big Data Developer Resume
SUMMARY:
- Around 8 Years of IT experience as a developer, Designer and Quality reviewer with cross platform integration experience using Hadoop, Hadoop architecture and python.
- Good understanding of Hadoop architecture and various components such as HDFS, Job tracker, Task Tracker, Name Node and Data Node.
- Strong understanding of Hadoop daemons and Map Reduce concepts
- Hands on experience in installing, configuring and using Hadoop ecosystems such as Map Reduce, HIVE, PIG, SQOOP, FLUME and OOZIE.
- Good Knowledge in loading the data from Oracle and MySQL databases to HDFS system using SQOOP (Structured Data) and FLUME (Log Files & XML).
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Experienced in writing custom Hive UDF’s to in corporate Business logic with Hive Queries.
- Knowledge on analyzing data interactively using Apache Spark and Apache Zeppelin.
- Good Knowledge in understanding the Apache Storm - Kafka pipelines.
- Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
- Extensive experience in working with application servers like WebSphere, WebLogic and Tomcat.
- Hands on experience in developing PIG Latin Scripts and Hive Query language for data analytics
- Strong understanding of NoSQL databases like Cassandra, HBase and MangoDB.
- Good Knowledge in job/workflow scheduling and monitoring tools like Oozie & Zookeeper.
- Extensive experience in design, development and support Model View Controller using Struts and Spring framework.
- Hands on experience in application development using core SCALA, RDBMS, Linux shell scripting and developed UNIX shell scripts to automate various processes.
- Proficiency in using BI tools like Tableau/Pentaho.
- Experience in understanding the security requirements for Hadoop and integrate with Key Distribution Centre.
- Extensive Experience in using database applications of RDBMS in ORACLE and MS Access, SQL Server
- Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Scrum, Waterfall and Agile.
- Well experienced in testing huge and complex databases, Reporting and ETL tools like Informatica and Data Stage.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Solr, Flume, Oozie, ZooKeeper, Kafka.
No SQL Databases: HBase, MangoDB, Cassandra
Languages: C, Python, Pig Latin, Scala, HiveQL, Perl, Unix shell scripts
Frameworks: Struts, Spring, Spring XD,Hibernate
Operating Systems: Ubuntu Linux, Windows XP/Vista/7/10, MAC OS
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, WebSphere
Databases: Oracle, MySQL,PL/SQL,PostgreSQL
Tools: and IDE Eclipse, Anaconda, Spyder
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Development Methodologies: Agile, Scrum, Waterfall
Highest Qualification: Bachelors
University: Vellore Institute of Technology
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop/Big Data Developer
Responsibilities:
- Worked on unstructured and semi structured data of 100 TB and with replication factor of 3 the total size is 300TB.
- Collected and aggregated a large amount of log data using Apache Flume and staged data in HDFS for further analysis.
- Used PIG as ETL tool for transforming, filtering, events joining and performing aggregations.
- Scripted UDF and UDAF’s for Hive.
- Populated HDFS and Cassandra with large amounts of data using Apache Kafka.
- Worked on Spark stream processing to get the data into in-memory and implemented RDD transformations, actions to process as units.
- Developed scripts and Batch job to schedule various Hadoop Programs.
- Developed Hive Queries for creating foundation tables from stage data.
- Used DML statements to perform different operations on Hive tables.
- Developed job flows to automate the workflow for PIG and HIVE jobs.
- Worked on Apache Crunch library to write, test and run Hadoop Map Reduce Pipeline jobs.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Cluster coordination services through Zookeeper.
- Created Hive tables, dynamic partitions and buckets for sampling and worked on them using the Hive QL.
- Extracted the data from Teradata into HDFS using Scoop.
- Adjusted the minimum share of the maps and reducers for all the queues.
- Used Tableau for visualizing the data reported from Hive tables.
- Worked in using Sequence files, RC File, AVRO and HAR file formats.
Environment: Hadoop, HDFS, Apache Crunch,Map Reduce, Hive, Flume, Sqoop, Zookeeper, Kafka, Storm, Cassandra, Spark, Puppet, Storm, Linux.
Confidential
Hadoop Developer
Responsibilities:
- Worked on Kafka-Storm on HDP 2.2 platform for real time analysis
- Created PoC to store server log data in MangoDB to identify System Alert Metrics.
- Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team.
- Computed MapReduce jobs using Java API and PIG Latin .
- Worked on loading the data into the cluster from the dynamically generated files using Flume and sent the cluster to Relational database management systems using SCOOP .
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Worked on Oozie for defining and scheduling jobs to manage apache Hadoop jobs by Directed Acyclic graph (DAG) of actions with control flows.
- Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
- Responsible for managing data from multiple sources.
- Wrote Pig scripts to run ETL jobs on the data in HDFS for future testing.
- Used Hive to analyze the data and checked for correlation.
- Imported data using Sqoop to load data from MySQL to HDFS and Hive on regular basis.
- Automatically Importing data regular basis using SQOOP to into the Hive partition by using apache Oozie.
- Supported Map Reduce Programs that are running on the cluster.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Used Agile methodology in developing the application, which included iterative application development, weekly status report and stand up meetings.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Flume, ZooKeeper, Agile Cloudera Manager,Oozie, MySQL, SQL, Linux
Confidential
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solution using Hadoop.
- Moved data from Oracle to HDFS and HDFS to Oracle using SQOOP.
- Worked on loading and transforming of large sets of semi structured data using Pig Latin operations.
- Wrote shell scripts to monitor the health checkup of Hadoop daemon services and responds according to any warning to failure conditions
- Imported and exported data into HDFS and HIVE using SQOOP.
- Wrote the Apache PIG scripts to process the HDFS data.
- Clustered customers category based on the offers using Apache Hive.
- Grouping, Aggregation and Sorting are done by using Pig and Hive which are higher level abstractions of MapReduce.
- Experience on Pig UDF’s for pre-processing the data for analysis
- Wrote Hive queries for data analysis to meet the business requirements.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Generated workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing using Pig.
- Extensive experience in performance tuning of oracle queries.
- Tested and validated Hadoop Log files.
- Created data-models for customer data using Cassandra Query language.
- Worked in monitoring, managing and troubleshooting the Hadoop Log files.
Environment: Apace Hadoop, Hive, Cassandra, DataStax, Oracle 11g/10g, MySQL, UNIX, Oozie
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in analysis and design of the application.
- Involved in preparing the detailed design document for the project.
- Developed the application using J2EE architecture.
- Involved in developing JSP forms.
- Designed and developed web pages using HTML and JSP.
- Designed various applets using JBuilder.
- Designed and developed Servlets to communicate between presentation and business layer.
- Used EJB as a middleware in developing a three-tier distributed application.
- Developed Session Beans and Entity beans to business and data process.
- Used JMS in the project for sending and receiving the messages on the queue.
- Developed the Servlets for processing the data on the server.
- The processed data is transferred to the database through Entity Bean.
- Used JDBC for database connectivity with MySQL Server.
- Used CVS for version control.
- Involved in unit testing using Junit.
Environment: Core Java, J2EE, JSP, Servlets, XML, XSLT, EJB, JDBC, JBuilder 8.0, JBoss, Swing, JavaScript, JMS, HTML, CSS, MySQL Server, CVS, Windows 2000