We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

0/5 (Submit Your Rating)

MinneapoliS

SUMMARY

  • Over 8 years of overall experience as software developer in design, development, deploying and supporting large scale distributed systems.
  • Over 3 years of extensive experience as Hadoop Developer and BigData Analyst.
  • Primary technical skills in HDFS, MapReduce, YARN, Pig, Hive, Sqoop, HBase, Flume, Oozie, Zookeeper.
  • Have good experience in extracting and generating statistical analysis using Business Intelligence tool Qlikview for better analysis of data.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
  • Hands on experience in working with Ecosystems like Hive, Pig, Sqoop, Map Reduce, Flume, Oozie.Strong knowledge of Pig and Hive’s analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in importing and exporting Tera bytes of data between HDFS and Relational Database Systems using Sqoop.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie and Ganglia, NoSQL databases such as HBase, Cassandra, BigTable, administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig.
  • Experience in design, development and testing of Distributed, Internet/Intranet/E - Commerce, Client/Server and Database applications mainly using technologies Java, Servlets, JDBC, JSP, Struts, Hibernate, Spring, JavaScript on WebLogic, Apache Tomcat Web/Application Servers and with Oracle and SQL Server Databases on Unix, windows NT platforms.
  • Extensive experience with Databases such as SQL Server, Oracle 11G.
  • Experience in writing SQL queries, Stored Procedures, Triggers, Cursors and Packages.
  • Good experience in writing optimized Map Reduce jobs using Java.
  • Has worked on ETL tools like INFORMATICA for Confidential project
  • Experience and good understanding in remodelling the Enterprise BI Application from existing data model to Netezza Hot Appliance Model.
  • Experience in implementing the physical data model in Netezza Database.
  • Experience in supporting Attunity; Click-2- Load’ solution for data replication purpose on Netezza IBM PureData System for Analytics.
  • Experience in handling XML files related technologies like Informatica XML parser & XML writer.
  • Experience in writing UNIX reusable scripts for Informatica jobs that are used for data movement and transformation purpose.
  • Good Knowledge and experience in performance tuning in the live systems for ETL/ELT jobs that are built on Informatica.
  • Experience in writing database objects like Stored Procedures, Triggers for Oracle, MS SQL
  • Server, and Netezza databases and good knowledge in PL/ SQL, hands on experience in writing medium level SQL queries
  • Experience in writing Teradata BTEQ, F-Load, M-Load scripts for ELT/ETL purpose.
  • Good knowledge in Impala, Spark, Shark, Storm, Ganglia.
  • Expertise in preparing the test cases, documenting and performing unit testing and Integration.
  • In-depth understanding of Data Structures and Algorithms and Optimization.
  • Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
  • Fast learner with good interpersonal skills, having strong analytical and communication skills and interested in problem solving and troubleshooting.
  • Self-motivated, excellent team player, with positive attitude and adhere to strict deadlines.

TECHNICAL SKILLS

Operating systems: WINDOWS, LINUX (Fedora, CentOS),UNIX

Languages and Technologies: C, C++, Java, SQL, PLSQL

Scripting Languages: Shell scripting

Databases: Oracle, MySQL, PostgreSQL

IDE: Eclipse and NetBeans

Application Servers: Apache Tomcat server, Apache HTTP webserver

Versioning Systems: Git, SVN

Hadoop Ecosystem: Hadoop MapReduce, HDFS, Flume, Sqoop, Hive, Pig, Oozie, Cloudera Manager Zookeeper. AWS EC2

Apache Spark: Spark, Spark SQL, Spark Streaming.

Cluster Management and Monitoring: Cloudera Manager, Hortonworks Ambari, Ganglia and Nagios.

Security: Kerberos.

No-SQL: Cassandra, HBase, DataStax

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Consultant

Responsibilities:

  • Used Sqoop to transfer data between RDBMS and HDFS.
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Implemented complex map reduce programs to perform map side joins using distributed cache
  • Designed and implemented custom writable, custom input formats, custom partitions and custom comparators in Mapreduce.
  • Thoroughly tested MapReduce programs using MRUnit and Junit testing frameworks.
  • Responsible for troubleshooting issues in the execution of Mapreduce jobs by inspecting and reviewing log files
  • Converted existing SQL queries into Hive QL queries.
  • Implemented UDFs, UDAFs, UDTFs in java for hive to process the data that can’t be performed using Hive inbuilt functions
  • Effectively used Oozie to develop automatic workflows of Sqoop, Mapreduce and Hive jobs.
  • Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Responsible for preparing the technical requirements for Informatica ETL mapping developments using Informatica Cloud Services/Salesforce/Windows platform from BDD logical models.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Loaded and analyzed Omniture logs generated by different web applications.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Refined the Website clickstream data from Omniture logs and moved it into Hive.
  • Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • PIG UDF was required to extract the information of the area from the huge data which we get from the sensors
  • Responsible for creating Hive tables based on business requirements
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Involved in NoSQL database design, integration and implementation.
  • Loaded data into NoSQL database HBase

Environment: Amazon web services (aws EC2 ),Sqoop, mapreduce, hive, oozie, Pig Latin, Hbase

BIG DATA ENGINEER

Confidential, Minneapolis

Responsibilities:

  • Involved in building a multi-node Hadoop Cluster
  • Installed and configured Cloudera Manager
  • Managed and analyzed Hadoop Log Files.
  • Managed jobs using Fair Scheduler.
  • Configured Hive Metastore to use Oracle database to establish multiple user connections to hive tables.
  • Responsible for deriving the new requirements based on business data driven method for ELT applications
  • Imported data into HDFS using Sqoop.
  • Experience in retrieving data from databases like MYSQL and Oracle into HDFS using Sqoop and ingesting them into HBase.
  • Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns.
  • Worked on shell scripting to automate jobs.
  • Used Pig Latin to analyze datasets and perform transformation according to business requirements.
  • Configured Nagios for receiving alerts on critical failures in the cluster by integrating with custom Shell Scripts.
  • Configured the Ganglia monitoring tool to monitor both Hadoop and system specific metrics.
  • Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
  • Implemented MapReduce programs to perform joins using secondary sorting and distributed cache.

Environment: Mapreduce, Sqoop, Hbase, Pig, Flume, Gangila, HDFS, Hive

Confidential

Hadoop Engineer

Responsibilities:

  • Installed, configured and deployed Hadoop cluster for development, production and testing.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Performed operating system installation, Hadoop version updates using automation tools.
  • Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • Upgraded the Hadoop cluster from cdh3 to cdh4.
  • Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Designed and allocated HDFS quotas for multiple groups.
  • Implemented rack aware topology on the Hadoop cluster.
  • Good experience in troubleshoot production level issues in the cluster and its functionality
  • Responsible for remodelling the Existing business logic to new Netezza models.
  • Understanding the exiting SQL Server Store procedures logic and convert them into ETL Requirements.
  • XML Generation Process - Identifying the required NZ source tables from the re-modelled NZ tables.
  • Identifying the Rulesets to be applied on each Client Info along with Members & Providers’ info.
  • Validate the data received & generate the XML files for each client and transferred to require to third parties/downstream systems.
  • Modifying the generated XML files using XML formatter/Validator/Beautifier as per business owner/third party requirements.
  • Supporting Attunity ;Click-2- Load’ solution for data replication purpose on Netezza IBM PureData System for Analytics.
  • Preparing the UNIX scripts for SFTP of XML files to different vendors on external Servers.
  • Created SSIS package for loading into MS SQL Server database for validated provider information
  • Unit testing and System Testing of mappings.
  • Scheduling the ETL jobs using Control M scheduler.
  • Monitoring the daily/weekly DW ETL workflows

Environment: - Informatica 9.1 PC, Oracle 11g, Netezza 7.0, MSSQL Server, Autosys, Toad, WinSQL, XML Reader, Windows XP & UNIX, HDFS, Gangila, zookeeper, Hadoop Clusters,Hive

Confidential

Java Developer

Responsibilities:

  • Utilized Agile Methodologies to manage full life-cycle development of the project.
  • Implemented MVC design pattern using Struts Framework.
  • Form classes of Struts Framework to write the routing logic and to call different services.
  • Created tile definitions, Struts-config files, validation files and resource bundles for all modules using Struts framework.
  • Developed web application using JSP custom tag libraries, Struts Action classes and Action.
  • Designed Java Servlets and Objects using J2EE standards.
  • Used JSP for presentation layer, developed high performance object/relational persistence and query service for entire application utilizing Hibernate.
  • Developed the XML Schema and Web services for the data maintenance and structures.
  • Used Web Sphere Application Server to develop and deploy the application.
  • Worked with various Style Sheets like Cascading Style Sheets (CSS).
  • Involved in coding for JUnit Test cases.

Environment: Web Sphere, CSS, J2EE, XML, Web Services, MVC using STRUTS Framework

Confidential

Junior Java Developer

Responsibilities:

  • Involved in design phase meetings for Business Analysis and Requirements gathering.
  • Worked with business functional lead to review and finalize requirements and data profiling analysis.
  • Worked on entry level Java programming assignments.
  • Responsible for gathering the requirements, designing and developing the applications.
  • Worked on UML diagrams for the project use case.
  • Worked with Java String manipulations, to parse CSV data for applications.
  • Connected Java applications to Java database to read, write data.
  • Developed static and dynamic Web Pages using JSP, HTML and CSS.
  • Worked on JavaScript for data validation on client side.
  • Involved in structuring Wiki and Forums for product documentation
  • Maintained the customer support portal.

Environment: Java, Servlets, JSP, JavaScript, HTML, PHP, CSS, Eclipse

We'd love your feedback!