Hadoop Developer Resume
San Jose, CA
SUMMARY
- Over 8+ years of professional IT experience, 7 years in Big Data Ecosystem experience in ingestion, querying, processing and analysis of big data.
- Experience in using Hadoop ecosystem components like Map Reduce, HDFS, HBase, ZooKeeper, Hive, Sqoop, Pig, Flume, Cloudera.
- Knowledge on NoSQL databases like HBase, Cassandra
- Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
- Experience in working with Map Reduce programs, Pig scripts and Hive commands to deliver the best results.
- Have competence on different Big Data frameworks such as Kafka, Neo4j, Hive, Elasticsearch, HDFS, YARN etc. and on various data visualization libraries such as D3.js etc.
- Extensively worked on development and optimization of Map reduce programs, PIG scripts and HIVE queries to create structured data for data mining.
- Solid knowledge of Hadoop architecture and daemons like Name node, Data nodes, Job trackers, Task Trackers.
- Good knowledge on ZooKeeper to coordinate clusters.
- Experience in Database design, Data analysis, Programming SQL, Stored procedure's PL/ SQL, and Triggers in Oracle and SQL Server.
- Experience in extending HIVE and PIG core functionality by using Custom user Defined functions.
- Experience in writing custom classes, functions, procedures, problem management, library controls and reusable components.
- Working knowledge on Oozie, a workflow scheduler system to manage the jobs that run on PIG, HIVE and SQOOP.
- Followed Test driven development of Agile and scrum Methodology to produce high quality software.
- Expert in AWS Cloud Formation template creation
- Experience in AWS EMR cluster configuration
- Experience in AWS cloud environment on S3 storage and EC2 instances.
- Experience in R - Studio by creating visualization on data file.
- Experienced in integrating Java-based web applications in a UNIX environment.
- Developed applications using JAVA, JSP, Servlets, JDBC, JavaScript, XML and HTML.
- Strong analytical skills with ability to quickly understand clients business needs. Involved in meetings to gather information and requirements from the clients.
- Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
TECHNICAL SKILLS
Hadoop Ecosystem: Kafka,HDFS,MapReduce,Hive,Impala,Pig,Sqoop,Flume,Oozie,Zookeeper,Ambari,Hue,Spark,Strom,Ganglia
Project Management / Tools / Applications: All MS Office suites(incl. 2003), MS Exchange & Outlook, Lotus Domino Notes, Citrix Client, SharePoint, MS Internet Explorer, Firefox, Chrome, Apache, IIS
Web Technologies: HTML, XML, CSS, JavaScript
NoSQL Databases: HBase, Cassandra
Databases: Oracle 8i/9i/10g, MySQL
Languages: Java, SQL, PL/SQL, Ruby, Shell Scripting
Operating Systems: UNIX(OSX, Solaris), Windows, Linux(Cent OS, Fedora, Red Hat)
IDE Tools: Eclipse, NetBeans
Application Server: Apache Tomcat
PROFESSIONAL EXPERIENCE
Confidential - San Jose, CA
Hadoop Developer
Responsibilities:
- Worked on Data Scientist activities and developed different scatter graphs using R-Studio.
- Created automated python scripts to validate the data flow through elastic search.
- Setting up the project/tenant with keystone user role.
- Experience in AWS cloud environment on S3 storage and EC2 instances.
- Creating the network, router, Subnet.
- Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from LINUX file system to Hadoop Distributed File System.
- Created Hbase tables to store various data formats of PII data coming from different portfolios.
- Experience in managing and reviewing Hadoop log files.
- Creating instances in openstack for setting up the environment.
- Setting up the ELK( ElatsticSearch, Logstash, Kibana) Cluster.
- Trouble shooting any Nova, Glance issue in openstack, Kafka, Rabbitmq bus.
- Performance testing of the environment- Creating python script to load on IO, CPU.
- Experience with OpenStack Cloud Platform.
- Experienced in Provisioning Hosts with flavors GP(General-purpose), SO(Storage Optimize), MO(Memory Optimize), CO(Compute Optimize).
Environment: Openstack, ElasticSearch, Logstash, Ansible, Rhel7, python, Kafka, streamsets, Influxdb, sensu, rabbitmq, Uchiwa, kibana, Hive,Pig,Hbase, Sqoop.
Confidential - Boston, MA
Hadoop Developer
Responsibilities:
- Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
- Extracted/Imported data from/to Databases into HDFS using Sqoop.
- Worked on reading multiple data formats on HDFS using Scala
- Implemented many complex Hive queries using Joins in Hive to optimize performance.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables.
- Developed and executed shell scripts to automate the jobs
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Cassandra and SQL
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Analyzed the Cassandra/SQL scripts and designed the solution to implement using Scala
- Worked on Log files using Flume import and performed Load Test on them.
- Worked with JSON based REST Web services and Amazon Web Services (AWS).
- Performed Load Test on AWS.
- Worked on the core and Spark SQL modules of Spark extensively.
- Experienced in running Hadoop streaming jobs to process terabytes data.
- Involved in importing the real time data to hadoop using Kafka and implemented the Oozie job for daily imports.
- Involved in requirement analysis, design, build, testing phases and responsible for documenting technical specifications.
Environment: & Tools: Hadoop, HDFS, AWS, Hive, Scala, Sqoop, Spark, SQL, Cassandra,Oozie,Tableau.
Confidential, Boston, MA
Hadoop Developer
Responsibilities:
- Installed and configured Pig and also written Pig Latin scripts.
- Involved in managing and reviewing Hadoop Job tracker log files and control-m log files.
- Scheduling and managing cron jobs, wrote shell scripts to generate alerts.
- Monitoring and managing daily jobs, processing around 200k files per day and monitoring those through RabbitMQ and Apache Dashboard application.
- Used Control-m scheduling tool to schedule daily jobs.
- Experience in administering and maintaining a Multi-rack Cassandra cluster
- Monitored workload, job performance and capacity planning using InsightIQ storage performance monitoring and storage analytics, experienced in defining job flows.
- Got good experience with NOSQL databases like Cassandra, Hbase.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers/sensors
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Worked on setting up High Availability for GPHD 2.2 with Zookeeper and quorum journal nodes.
- Used Control-m scheduling tool to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
- Involved in Scrum calls, Grooming and Demo meeting, Very good experience with agile methodology.
Environment: Apache Hadoop 2.3, gphd-1.2, gphd-2.2, Map Reduce 2.3, HDFS, Hive, Java 1.6 & 1.7, Cassandra, Pig, SpringXD, Linux, Eclipse, RabbitMQ, Zookeeper, PostgresDB, Apache Solar, Control-M, Redis., Tableau, Qlikview, DataStax.
Confidential, NC
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Installed and configured Pig and also written Pig Latin scripts.
- Developed PIG scripts using Pig Latin.
- Involved in managing and reviewing Hadoop log files.
- Exported data using Sqoop from HDFS to Teradata on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Written Hive queries for data analysis to meet the business requirements.
- Creating Hive tables and working on them using Hive QL.
- Experienced in defining job flows.
- Got good experience with NOSQL databases like Cassandra.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Designed and implemented Map reduce-based large-scale parallel relation-learning system
- Setup and benchmarked Hadoop clusters for internal use.
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
- Monitoring the log flow from LM Proxy to ES-Head.
- Used secportal as front end of Gracie where we perform the search operations.
- Wrote the Map Reduce code for the flow from Hadoop Flume to ES Head.
Environment: Cloudera Hadoop(CDH 4.4), Map Reduce, HDFS, Hive, Java, Pig, Cassandra, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, PL/SQL, SQL connector, Sub Version.
Confidential, NY
Hadoop Developer
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Installed and configured Pig and Developed PigLatin scripts.
- Involved in managing and reviewing Hadoop log files.
- Importing and exporting data using Sqoop to load data to and from Teradata to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Prepared avroschem files for generating Hive tables.
- Creating Hive tables and working on them using Hive QL.
- Experienced in defining job flows.
- Good exposure to NOSQL database HBase.
- Developed Custom UDFs in PIG.
- Prepared shell scripts for executing Hadoop commands for single execution.
- Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
- Setup and benchmarked Hadoop/HBase clusters for internal use.
Environment: Hadoop CDH4.1.1, Pig 0.9.1, Avro, Oozie 3.2.0, Sqoop, Hive, PIG, Java 1.6, Eclipse, Teradata, HBase.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in Java, J2EE, struts, web services and Hibernate in a fast paced development environment.
- Followed agile methodology, interacted directly with the client on the features, implemented optimal solutions, and tailor application to customer needs.
- Involved in design and implementation of web tier using Servlets and JSP.
- Used Apache POI for Excel files reading.
- Developed the user interface using JSP and Java Script to view all online trading transactions.
- Designed and developed Data Access Objects (DAO) to access the database.
- Used DAO Factory and value object design patterns to organize and integrate the JAVA Objects
- Coded Java Server Pages for the Dynamic front end content that use Servlets and EJBs.
- Coded HTML pages using CSS for static content generation with JavaScript for validations.
- Used JDBC API to connect to the database and carry out database operations.
- Used JSP and JSTL Tag Libraries for developing User Interface components.
- Performing Code Reviews.
- Performed unit testing, system testing and integration testing.
- Involved in building and deployment of application in Linux environment.
Environment: Java, J2EE, JDBC, Struts, SQL. Hibernate, Eclipse, Apache POI, CSS.s