Hadoop Developer Resume
Milwaukee, WI
PROFESSIONAL SUMMARY:
- Over 7+years of IT experience in the Testing, Analysis, design, development and Implementation of Bigdata Hadoop Data warehouse/Business Intelligence solutions using Hadoop, HBase, Hive Pig, Sqoop, Zookeeper, Spark, HFDS, Kafka, Java, HUE, Map Reduce, Flume and Database.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Namenode and Datanode.
- Technical expertise in Big data/Hadoop HDFS, Map Reduce, Apache Hive, Apache Pig, Sqoop, HBase,Flume, Storm, Kafka, Spark, Oozie, Zookeeper, NoSQL Data bases HBase, Cassandra, MongoDB.
- Experience in the developing NoSQL database by using CRUD, Sharding, Indexing and Replication.
- Experience in good understanding of Apache Storm - Kafka pipelines.
- Extensive experience working in Teradata, Oracle, Netezza,Informatica, SQL Server and MySQL database.
- Good Experience in data loading from Oracle and MYSQL databases to HDFS system using Sqoop (Structure Data) and Flume (Log Files & XML).
- Knowledge on analyzing data interactively using Apache Spark and Apache Zeppelin .
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Experienced in writing custom Hive UDF's to in corporate business logic with Hive queries.
- Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
- Experience in understanding the security requirements for Hadoop and integrate with Key Distribution Centre.
- Proficient in Java, Scala and Python.
- Expertise in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Hands on experience in using BI tools like Tableau/Pentaho.
- Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
- Involved in design and development of various web and enterprise applications using various technologies like JSP, Servlets, Struts, Hibernate, and spring, JDBC, JSF, XML, Java Script, HTML, AJAX, SOAP and Amazon Web Services.
- Experience in constructing pipelines using workflow tools like Oozie.
- Experienced in providing real time analytics on big data platforms using HBase, Cassandra and Mongo DB.
- Hands on experience in application development using core JAVA, RDBMS, Linux shellscripting and also developed UNIX shell scripts to automate various processes.
- Having Experience on Development applications like Eclipse, RAD etc.
- Expertise in Unit Testing, Integration Testing, System Testing and experience in preparing the Test Cases, Test Scenarios and Test plans.
- Ability to work independently as well as in a team and able to effectively communicate with customers, peers and management at all levels in and outside the organization.
AREA OF EXPERTISE:
Languages: C, C++, Java(JSP,Servlets,JavaBeansJDBC,XML), Shell Scripting
Big Data Ecosystem: Hadoop,MapReduce,YARN,Pig,Hive,HBase,Flume,Sqoop, Impala,Oozie,Zookeeper,Spark,Ambari,Mahout,MongoDB,Cassandra, Avro, Parquet, Snappy, Kafka.
Databases: Oracle, MySQL,PL/SQL,PostgreSQL
No SQL Databases: Cassandra,MongoDB,Hbase,DynamoDB
Operating Systems: UNIX,Linux,MAC OS, Windows XP, Server 2003, Server 2008
Development Tools: Eclipse 3.3,Ant,Maven,JUNIT.log$J,ETL
Programming Languages: HTML5,CSS 3,JAVASCRIPT,AJAX,JQUERY,.NET,Visual Studio 2010
Network protocols: TCP/IP, UDP, HTTP, DNS, DHCP, OSPF, RIP
Frameworks: Struts,Spring,Hibernate,MVC
PROFESSIONAL EXPERIENCE:
Confidential, MILWAUKEE, WI
HADOOP DEVELOPER
Responsibilities:
- Worked with highly unstructured and semi structured data of 90 TB in size (270 TB with replication factor of 3)
- Developing scripts and Batch Job to schedule various Hadoop programs.
- Used Pig as ETL tool to do transformations, event joins, filter & some pre-aggregations.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Used Spark stream processing to get data into in-memory, implemented RDD transformations, actions to process as units
- Created/modified UDF and UDAFs for Hive.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Used DML statements to perform different operations on Hive Tables.
- Developed Hive queries for creating foundation tables from stage data .
- Adjusting the minimum share of maps and reducers for all the queues.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- ManagedAmazon Web Services (AWS) EC2 with Puppet.
- Working with Apache Crunch library to write, test and run HADOOP MapReduce pipeline jobs.
- Efficiently put and fetched data to/from HBase by writing Map/Reduce job in Java/Python.
- Cluster coordination services through Zookeeper.
- Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL.
- Experienced on loading and transforming of large sets of semi structured data using Pig Latin operations.
- Extracted the data from Teradata into HDFS using Sqoop.
- Data Visualization usingTableau for reporting from Hive Tables.
- Worked in using Sequence files, RCFile, AVRO and HAR file formats.
Environment: Hadoop, HDFS, Apache Crunch,Map Reduce, Hive, Flume, Sqoop,Zookeeper,Kafka,Storm,Cassandra, Spark,Puppet, Storm,Linux.
Confidential, DES MOINES, IA
HADOOP DEVELOPER
Responsibilities:
- Have real-time experience of Kafka-Storm on HDP 2.2 platform for real time analysis.
- Created PoC to store Server Log data in MongoDB to identify System Alert Metrics
- Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Performed analysis on the unused user navigation data by loading into HDFS and writing MapReduce jobs. The analysis provided inputs to the new APM front end developers and lucent team.
- Wrote MapReduce jobs using Java API and Pig Latin.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Wrote Pig scripts to run ETL jobs on the data in HDFS and further do testing.
- Used Hive to do analysis on the data and identify different correlations.
- Imported data using Sqoop to load data from MySQL to HDFS and Hive on regular basis.
- Written Hive queries for data analysis to meet the business requirements.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Involved in using Oozie for defining amd scheduling jobs to manage apache Hadoop jobs by Directed Acyclic graph (DAG) of actions with control flows.
- Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
- Automatically Importing data regular basis using sqoop to into the Hive partition by using apache Oozie.
- Supported Map Reduce Programs those are running on the cluster.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Used Qlikview and D3 for visualization of query required by BI team
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Flume, ZooKeeper, Cloudera Manager,Oozie, Java (jdk1.6), MySQL, SQL, Windows NT, Linux
Confidential, Kansas city, MO
Big Data Developer
Responsibilities:
- Created HBase tables to load large sets of semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Experience in Bench Marking and performance tuning of Hadoop cluster namely CPU, memory, I/O, mapred & yarn configuration.
- Experience in ingesting Structured, unstructured and log data to Hadoop HDFS and Netezza & Greenplum using Spark & Sqoop and Informatica.
- Experience inhandling data cleansing, data profiling, data lineage and denormalization, & aggregation of big data.
- Analyzing/Transforming data with Hive and Pig.
- Worked with different Hive file formats like RC file, Sequence file, ORC file format and Parquet.
- Experience with load Balancers on theAWS.
- Took Splunk tools used for log aggregation and implemented log data analysis.
- Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows and coordinator jobs in Oozie.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Involved in writing optimized Pig Script along with involved in developing and testing PIG Latin Scripts.
- Working knowledge in writing Pig's Load and Store functions.
- Developed job flows to automate the workflow for PIG and HIVE jobs.
- Responsible for writing Hive Queries for analyzing data using Hive QueryLanguage (HQL).
- Tested and reported defects in an Agile Methodology perspective.
- Managed and reviewed Hadoop log files.
- Created R function and Spark stream to pull customer sentiment data from Twitter.
- Experience in using Pentaho Data Integration tool for data integration, OLAP analysis and ETL process.
- Experienced in converting ETL operations to Hadoop system using Pig Latin operations, transformations and functions.
Environment: Hadoop, YARN,HDFS, Map Reduce, Hive, Oozie, HiveQL,Netezza,Informatica,HBase, Pig, MySQL, NoSQL,Spark Sqoop,Pentaho
Confidential
HADOOP DEVELOPER
Responsibilities:
- Responsible for business logic using java and JavaScript, JDBC for querying database.
- Migrating the needed data from MySQL&MongoDB into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
- Involved in developing UDFs for both Pig and Hive in Java.
- Used Hive to analyze the partitioning and bucketing data and compute various metrics for reporting.
- Used default MapReduce Input and Output Formats.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Building, packaging and deploying the code to the Hadoop servers.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Developed the user interfaces using HTML, CSS , AJAX and JAVASCRIPT.
- Experience in client side Technologies such as HTML, CSS, JavaScript, jQuery
- Worked on HBase by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
- Conducted data extraction that may include analyzing, reviewing, modeling based on requirements using higher level tools such as Hive and Pig.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop,MySQL,MongoDB, Flume, Oracle 11g, Core Java, HTML,Eclipse,Tableau.
