We provide IT Staff Augmentation Services!

Hadoop Lead Developer Resume

2.00/5 (Submit Your Rating)

HalifaX

SUMMARY

  • Over 6+ years of professional IT experience, 4+ years in Big Data Ecosystem experience in ingestion, querying, processing and analysis of big data.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Hands on experience in installing and configuring Hadoop ecosystem components like Oozie, Hive, Sqoop, Zookeeper, Pig, and Flume.
  • Good Exposure on Map Reduce programming (JAVA/ Python), Hive, Pig scripting, Spark SQL (Scala/Python).
  • Experience in building Data pipelines using Kafka and Spark.
  • Experience in managing and reviewing Hadoop log files.
  • Hands on experience in Import/Export of data using Hadoop Data Management tool Sqoop.
  • Experience with distributed systems, large - scale non-relational data stores, RDBMS, NoSQL, map-reduce systems, data modeling, database performance, and multi-terabyte data warehouses.
  • Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.Worked extensively on building Rapid Development Framework using Core Java.
  • Extensive experience and actively involved in Requirement gathering, Analysis, Design, Reviews, Coding, Code Reviews, Unit and Integration Testing.
  • Extensive experience in designing front end interfaces using HTML, CSS, Java Script and Ajax.
  • Good Experience using Object Relational Mapping tool like Hibernate.
  • Experience in Spring Framework such as Spring IOC, Spring Resources, Spring JDBC.
  • Experience with various IDEs like IntelliJ, Eclipse, JBuilder and Velocity Studio.
  • Implemented the service projects on Agile Methodology and involved in running the scrum meetings.
  • Implemented the core product projects on Lean and Kanban Methodology and involved in delivering high quality health care product.
  • Worked with BI teams in generating the reports and designed ETL workflows on Tableau.
  • Experience in developing web-services using REST, SOAP, WSDL and Apache AXIS2.
  • Experience in writing the SQL queries.
  • Mapped VO in Domain Model to tables in Relation Model and Generated SQL Scripts in Concept Wave.
  • Coded JDBC calls in the ConceptWave to access the Oracle database tables.
  • Experience in designing and developing UI Screens using Html, CSS and JavaScript.
  • Experience in AWS cloud environment on S3 storage and EC2 instances.
  • Used CVS, GIT and SVN for Source code version control.
  • Experience in designing transaction processing systems deployed on various application servers including Tomcat, Web Sphere, Web logic.
  • Good Experience on Quality Control, JIRA, Fish Eye for tracking the tickets like accepting the tickets/defects, Submitting the tickets, Reviewing Code and closing the tickets etc.,
  • Designed dynamic user interfaces using AJAX and JQuery to retrieve data without reloading the page and send asynchronous request.
  • Excellent Experience in Code Refactoring.
  • Excellent Client interaction skills and proven experience in working independently as well as in a team.
  • Excellent communication, analytical, interpersonal and presentation skills.

TECHNICAL SKILLS

Hadoop Ecosystem: Kafka,HDFS,MapReduce,Hive,Impala,Pig,Sqoop,Flume,Oozie,Zookeeper,Ambari,Hue,Spark,Strom, Ganglia

Project Management / Tools / Applications: All MS Office suites(incl. 2003), MS Exchange & Outlook, Lotus Domino Notes, Citrix Client, SharePoint, MS Internet Explorer, Firefox, Chrome, Apache, IIS

Web Technologies: JDBC, Servlets, JSP, JSTL, JNDI, XML, HTML, CSS and AJAX

NoSQL Databases: HBase, Cassandra

Databases: Oracle 8i/9i/10g, MySQL

Languages: Java, SQL, PL/SQL, Ruby, Shell Scripting

Operating Systems: UNIX (OSX, Solaris), Windows, Linux (Cent OS, Fedora, Red Hat)

Frame Works: Struts, ConceptWave, ATG 7.0

Application Server: Apache Tomcat

Hadoop Platforms: Cloudera, Hortonworks, MapR

PROFESSIONAL EXPERIENCE

Confidential, Halifax

Hadoop Lead Developer

Responsibilities:

  • Worked on multiple projects using various Big Data Technologies.
  • Worked on Data Scientist activities and developed different scatter graphs using R-Studio.
  • Worked on various types of machine learning algorithms for Regression, Classification and Clustering.
  • Created automated python scripts to validate the data flow through elastic search.
  • Experience in AWS cloud environment on S3 storage and EC2 instances.
  • Created high level and technical architecture for HortonworksHadoopHive for Data Warehousing Support on HDFS (Batch Processing), Impala and Spark for Query Processing for Real Time Data analytics.
  • Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to Hadoop Distributed File System.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Experience in managing and reviewing Hadoop log files.
  • Setting up the ELK (ElatsticSearch, Logstash, Kibana) Cluster.
  • Trouble shooting any Nova, Glance issue in Kafka, Rabbitmq bus.
  • Performance testing of the environment- Creating python script to load on IO, CPU.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in Design, development, implementation and documentation in various big data technologies.
  • Django Framework used in developing web applications to implement the MVC architecture
  • Used Django APIs for database access
  • Design and Development of adapters to inject and eject data from various data source to/from Kafka.
  • Design and development of HBase tables according to various needs of the tenants while taking into consideration various issues related to performance.
  • Documenting the process of designing and developing the HBase tables.
  • Interact with various business teams to document the requirements for HBase tables.
  • Developed Spark applications to move data into HBase tables from various sources like Relational Database or Hive.
  • Design, Development and Documentation of various sqoop scripts to pull the data into Hadoop eco system.
  • Maintaining and scheduling various Spark, MapReduce and sqoop jobs according to business needs and maintaining data consistency using oozie.
  • Troubleshoot any issues with Spark, MapReduce applications at run time by using logs.
  • Develop Spark/MapReduce jobs to parse the JSON or XML data.
  • Optimize the Spark applications both while developing and while submitting it to the cluster with various environment arguments.
  • Developed and written ApachePIG scripts and HIVE scripts to Load, Store and Process the Data.
  • Worked closely with analysts and architects to understand the business requirements for system enhancements and Data Analytics (Using HadoopOLAP principles).
  • Working knowledge in writing Pig's Load and Store functions.
  • Developed sqoop scripts that can store the data from relation to hive and hbase directly.
  • Demo various functional capabilities of Spark.
  • Created customer relationship dashboard for executives using Tableau.

Environment: ElasticSearch, Logstash, Ansible, Tableau, Python, Kafka, Streamsets, RStudio, Sensu, Oozie, Kibana, Hive, Pig, Hbase, Sqoop.

Confidential, Regina, Saskatchewan

Hadoop Developer

Responsibilities:

  • Implemented CDH3Hadoop cluster on CentOS.
  • Implemented POC's to configure data tax Cassandra with Hadoop.
  • Launching AmazonEC2Cloud Instances using Amazon Images (Linux/Ubuntu) and Configuring
  • Installed the application on AWS EC2 instances and configured the storage on S3 buckets.
  • Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift, Dynamo DB.
  • POC on Data Search using Elastic Search
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline. This pipeline is also involved in Amazon Web Services EMR, S3 and RDS.
  • Launched instances with respect to specific applications.
  • Data modeling conversion from Oracle to MySQL, Hive tables.
  • Lead initiatives in developing cloud-based, SaaS solutions for design market.
  • Import the data from different sources like HDFS/HBase into SparkRDD
  • Developed RDD's/Data Frames in Spark using Scala and Python and applied several transformation logics to load data from Hadoop Data Lake to Cassandra DB.
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Involved in Performance Optimization of Queries & Stored Procedures by analyzing Query Plans, blocking queries, Identifying missing indexes
  • Real time streaming of data using Spark with Kafka.
  • Created tables in HIVE and after that load data from HDFS to HIVE
  • Hands on experience in loading data from UNIX file system to HDFS.
  • Experienced with Performing Cassandra Query operations using Thrift API to perform real time analytics.
  • Worked on Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • Cluster coordination services through Zookeeper.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Involved in creating Hive tables, loading data and running hive queries in those data.
  • Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
  • Experience in developing Maven and ANT scripts to automate the compilation, deployment and testing of web application
  • Developed Apache Spark jobs using Scala in test environment for faster data processing and used SparkSQL for querying
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
  • Working knowledge in writing Pig's Load and Store functions.

Environment: Apache Hadoop, MapReduce, Scala, HDFS, Python, Zookeeper, Sqoop, Kafka, MySQL, Cassandra, Redshift, Dynamo DB, Hive, Pig, Oozie, Spark SQL, Scala, Cassandra, Cloudera CDH3, Oracle, Maven, Ant, Eclipse, Amazon EC2, EMR, S3.

Confidential, Boston, MA

Hadoop Developer

Responsibilities:

  • Installed and configured Pig and also written Pig Latin scripts.
  • Involved in managing and reviewing Hadoop Job tracker log files and control-m log files.
  • Scheduling and managing cron jobs, wrote shell scripts to generate alerts.
  • Monitoring and managing daily jobs, processing around 200k files per day and monitoring those through RabbitMQ and Apache Dashboard application.
  • Used Control-m scheduling tool to schedule daily jobs.
  • Experience in administering and maintaining a Multi-rack Cassandra cluster
  • Monitored workload, job performance and capacity planning using InsightIQ storage performance monitoring and storage analytics, experienced in defining job flows.
  • Got good experience with NOSQL databases like Cassandra, Hbase.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers/sensors
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Worked on setting up High Availability for GPHD 2.2 with Zookeeper and quorum journal nodes.
  • Used Control-m scheduling tool to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.

Environment: Apache Hadoop 2.3, gphd-1.2, gphd-2.2, Map Reduce 2.3, HDFS, Hive, Java 1.6 & 1.7, Cassandra, Pig, SpringXD, Linux, Eclipse, RabbitMQ, Zookeeper, PostgresDB, Apache Solar, Control-M, Redis., Tableau, Qlikview, DataStax.

Confidential

Python Developer

Responsibilities:

  • Involved in the Analysis, design, and architecture of the Account module for the Punjab Bank ‘s web based application.
  • Worked on requirement gathering and High-level design.
  • Created PHP/MySQL backend for data entry from Flash I had to assist the Flash developer send the correct data via query strings. Used HTML/CSS, XML and JavaScript for UI development.
  • Converted Visual Basic Application to Python, MySQL.
  • Generated Python Django Forms, Crispy forms to record data, login and signup of online users.
  • Experience in developing test automation.
  • Designed and Implemented a Random Unique Test Selector Package on processing large volume of data using Python and Django ORM
  • Skilled in using collections in Python for manipulating and looping through different user defined objects
  • Created data base tables, functions, stored procedures and wrote prepared statements using PL/SQL
  • Modified queries, functions, cursors, triggers and stored procedures for MySQL database to improve performance, while processing data
  • Extensive code reviewing using GitHub pull requests, improved code quality, and also conducted meetings among peer
  • Responsible for Parsing XML data using XML parser and Testing, fixing the bugs and coding modifications
  • Database Administration activities like taking backup, checking log messages, looking for database optimization
  • Used Redis as messaging broker to execute asynchronous tasks
  • Designed and implemented a dedicated MYSQL database server to drive the web applications and report on daily progress

Environment: Python2.7, Django 1.4, Jenkins, MySQL, Linux, HTML, CSS, JQuery, JavaScript, Apache, Linux, Git

Confidential

Python Developer

Responsibilities:

  • Used SDLC process to develop website functionality.
  • Designed and developed the UI of the website using HTML, XHTML, AJAX, CSS, and JavaScript.
  • Developed entire frontend and backend modules using Python on Django Web Framework on SQL Server
  • Used Django APIs for database access
  • Designed and developed data management system using MySQL. Built application logic using Python 2.7
  • Parsed XML file using Python to extract data from database Participated in requirement gathering and worked closely with the architect in designing and modeling
  • Worked on development of SQL and stored procedures, trigger, and function.
  • Developed shopping cart for Library and integrated web services to access the payment (E-commerce).
  • Designed and developed a horizontally scalable APIs using Python Flask.
  • Developed shopping cart for Library and integrated web services to access the payment.
  • Used Php language on lamp server to develop page.
  • Used Flex for validation of request response and schema
  • Developed dynamic interaction page on .net MS visual basic, using SQL developer tools

Environment: Python 2.6/2.7, JavaScript, Django Framework 1.3, SQL, JQuery, Adobe Dreamweaver, Apache web server, PHP, SQL developer tool, Flex, Apache Cassandra.

We'd love your feedback!