Sr. Hadoop Consultant Resume
Mentor, OH
SUMMARY
- Over 9+ years of experience in Information Technology which includes experience in Bigdata, Hadoop Ecosystem,Core Java/J2EE and strong in Design, Software processes, Requirement gathering, Analysis anddevelopment of software applications.
- Experience in Hadoop Administration activities such as installation, configuration, and management of clusters in Cloudera (CDH4, CDH5), &Hortonworks (HDP) Distributions using Cloudera Manager & Ambari.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, MapReduce, Hive, Impala, Sqoop, Pig, Oozie, Zookeeper, Spark, Solr, Hue, Flume, Storm, Kafka and Yarn distributions.
- Very good Knowledge and experience in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Experienced in performance tuning of Yarn, Spark, and Hive and experienced in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top ofHadoopperformed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
- Experienced in importing & exporting data between HDFS and Relational Database Management systems using Sqoop and troubleshooting for any issues.
- Extensive experience in working with various distributions ofHadooplike enterprise versions of Cloudera (CDH4/CDH5), Hortonworks and good knowledge on MapR distribution, IBM Big Insights and Amazon's EMR (Elastic MapReduce)
- Exposure to Data Lake Implementation using Apache Spark and developed Data pipe lines and applied business logics using Spark and used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Good Understanding and experience on NameNode HA architecture and experience in monitoring the health of cluster using Ambari, Nagios, Ganglia and Cronjobs.
- Experienced in Cluster maintenance and Commissioning /Decommissioning of Data Nodes and good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, and Task Tracker, NameNode, DataNode and MapReduce concepts.
- Experienced in implementation of security controls using Kerberos principals, ACLs, Data encryptions using DM - Crypt to protect entire Hadoop clusters.
- Well-versed in spark components like Spark SQL, MLib, Spark streaming and GraphX.
- Expertise in installation, administration, patches, upgrade, configuration, performance tuning and troubleshooting of Red hat Linux, SUSE, CentOS, AIX, Solaris.
- Experienced Schedule Recurring Hadoop Jobs with Apache Oozie and experience in Jumpstart, Kickstart, Infrastructure setup and Installation Methods for Linux.
- Good knowledge in troubleshooting skills, understanding of system's capacity, bottlenecks, basics of memory, CPU, OS, storage, and network.
- Experience in administration activities of RDBMS data bases, such as MS SQL Server.
- Experienced in Hadoop Distributed File System and Ecosystem (MapReduce, Pig, Hive, Sqoop, YARN, MongoDB and HBase) and knowledge of NoSQL databases such as HBase, Cassandra and MongoDB.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
TECHNICAL SKILLS
Hadoop Ecosystem Tools: MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Storm, Kafka, Spark, Flume
Languages: Java, core java, HTML, Programming C, C++ Databases MySQL, Oracle … SQL Server, MongoDB
Platforms: Linux (RHEL, Ubuntu,), open Solaris, AIX
Scripting Languages: Shell Scripting, HTML scripting, Python, Puppet
Web Servers: Apache Tomcat, JBOSS, windows server2003, 2008, and 2012
Cluster Management Tools: HDP Ambari, Cloudera Manager, Hue, Solr Cloud
PROFESSIONAL EXPERIENCE
Confidential, Mentor OH
Sr. Hadoop Consultant
Responsibilities:
- Working on 4 Hadoop clusters for different teams, supporting 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Managed 350+ Nodes CDH 5.2 cluster with 4 petabytes of data using Cloudera Manager and Linux RedHat 6.5.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Upgraded the Hadoop cluster from CDH4.7 to CDH5.2 and worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
- Developed Spark scripts to import large files from Amazon S3 buckets and imported the data from different sources like HDFS/HBase into SparkRDD.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation and worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
- Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.
- Monitored cluster for performance and, networking and data integrity issues and responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning.
- Supported MapReduce Programs and distributed applications running on the Hadoop cluster and scripting Hadoop package installation and configuration to support fully-automated deployments.
- Migrated existing on-premises application to AWS and used AWS services like EC2 and S3 for large data sets processing and storage and worked with ELASTIC MAPREDUCE and setupHadoopenvironment in AWS EC2 Instances.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Perform maintenance, monitoring, deployments, upgrades and optimizing across infrastructure that supports all our Hadoop clusters and worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Created Hive External tables and loaded the data in to tables and query data using HQL and worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Monitoring Hadoop cluster using tools like Nagios, Ganglia, and Cloudera Manager and maintaining the Cluster by adding and removing of nodes using tools like Ganglia, Nagios, and Cloudera Manager.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark)
Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Spark, Spark-Streaming, Spark SQL, AWS EMR, AWS S3, AWS Redshift, Python, Scala, Pyspark, MapR, Java, Oozie, Flume, HBase, Nagios, Ganglia, Hue, Cloudera Manager, Zookeeper, Cloudera, Oracle, Kerberos and RedHat 6.5
Confidential, Chicago IL
Sr. Hadoop Consultant
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed and Configured Sqoop to import and export the data into MapR-FS, HBase and Hive from Relational databases.
- Administering large MapR Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
- Installed and Configured MapR-zookeeper, MapR-cldb, MapP-jobtracker, MapR-tasktracker, MapRresourcemanager, MapR-node manager, MapR-fileserver, and MapR-webserver.
- Installed and configured Knox gateway to secure HIVE through ODBC, WebHcat and Oozie services.
- Load data from relational databases into MapR-FS filesystem and HBase using Sqoop and setting up MapR metrics with NoSQL database to log metrics data.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level and optimized Hadoop clusters components to achieve high performance.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data into HDFS for analysis.
- Integrated HDP clusters with Active Directory and enabled Kerberos for Authentication.
- Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning and installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Worked on creating the Data Model for HBase from the current Oracle Data model.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Leveraged Chef to manage and maintain builds in various environments and planned for hardware and software installation on production cluster and communicated with multiple teams to get it done.
- Monitoring the Hadoop cluster functioning through MCS and worked on NoSQL databases including HBase.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs and worked with Linux server admin team in administering the server hardware and operating system.
- Worked closely with data analysts to construct creative solutions for their analysis tasks and managed and reviewed Hadoop and HBase log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports and worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Automated workflows using shell scripts pull data from various databases into Hadoop.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Kafka, Zookeeper, Oozie, Impala, Java, Cloudera, Oracle, Teradata SQL Server, Python, UNIX Shell Scripting, ETL, Flume, Scala, Spark, Sqoop, Python, AWS, S3, EC2, Kafka, Oracle, MySQL, Hortonworks, YARN, Python
Confidential - Atlanta, GA
Hadoop Administrator
Responsibilities:
- Installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH4) distributions and on Amazon web services (AWS).
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Worked on Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Setting an Amazon Web Services (AWS) EC2 instance for the Cloudera Manager server.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Identify query duplication, complexity and dependency to minimize migration efforts Technology stack: Oracle, Hortonworks HDP cluster, Attunity Visibility, Cloudera Navigator Optimizer, AWS Cloud and Dynamo DB.
- Shared responsibility for administration of Hadoop, Hive, and Pig and managed and reviewed Hadoop log files and updating the configuration on each host.
- Worked with Spark eco system using Scala, Python and HIVE Queries on different data formats like Text file and parquet.
- Tested raw data and executed performance scripts and configuring Cloudera Manager Agent heartbeat interval and timeouts.
- Worked with teams in setting up AWS EC2 instances by using different AWS services like S3, EBS, Elastic Load Balancer, and Auto scaling groups, VPC subnets and CloudWatch.
- Implemented CDH3 Hadoop cluster on RedHat Enterprise Linux 6.4, assisted with performance tuning and monitoring.
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages.
- Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check-pointing, and SBT.
- Providing reports to management on Cluster Usage Metrics and related HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL, and a variety of portfolios.
- Involved in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) and MapR.
- Performed installation, upgrade and configure tasks for impala on all machines in a cluster and supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive and assisted with data capacity planning and node forecasting.
- Managing Amazon Web Services (AWS) infrastructure with automation and configuration.
- Administrator for Pig, Hive and HBase installing updates, patches, and upgrades and performed both major and minor upgrades to the existing CDH cluster and upgraded the Hadoop cluster from CDH3 to CDH4.
- Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python.
Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Impala, Java (jdk1.6), Cloudera, Oracle, SQL Server, UNIX Shell Scripting, Flume, Oozie, Scala, Spark, ETL, Sqoop, Python, kafka, PySpark, AWS, S3, MongoDB, Oracle, SQL, Hortonworks, XML, RedHat Linux 6.4
Confidential - Louisville, KY
Hadoop Administrator/Developer
Responsibilities:
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing and assisted with data capacity planning and node forecasting.
- Involved in design and ongoing operation of several Hadoop clusters and Configured and deployed Hive Meta store using MySQL and thrift server
- Exploring with Spark for improving the performance and optimization of the existing algorithms inHadoopusing Spark Context, Spark-SQL, Data Frame, Pair RDD's.
- Implemented and operated on-premises Hadoop clusters from the hardware to the application layer including compute and storage.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
- Designed custom deployment and configuration automation systems to allow for hands-off management of clusters via Cobbler, FUNC, and Puppet.
- Prepared complete description documentation as per the Knowledge Transferred about the Phase-II TalenD Job Design and goal and prepared documentation about the Support and Maintenance work to be followed in TalenD.
- Deployed the company's first Hadoop cluster running Cloudera's CDH2 to a 44-node cluster storing 160TB and connecting via 1 GB Ethernet.
- Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
- Modified reports and TalenD ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
- Involved in Cluster Maintenance and removal of nodes using Cloudera Manager.
- Collaborated with application development teams to provide operational support, platform expansion, and upgrades for Hadoop Infrastructure including upgrades to CDH3.
- Participated in Hadoop development Scrum and installed, Configured Cognos8.4/10 and TalenD ETL on single and multi-server environments.
Environment: Apache Hadoop, Cloudera, Pig, Hive, TalenD, Map-reduce, Sqoop, UNIX, Cassandra, Java, LINUX, Oracle 11gR2, UNIX Shell Scripting, Kerberos
Confidential
Sr. Java Developer
Responsibilities:
- Implemented Multi-Threaded Environment and used most of the interfaces under the collection framework by using CoreJavaConcepts.
- Developed Graphical User Interfaces by using JSF, JSP, HTML, DHTML, Angularjs, CSS, and JavaScript and developed scripts inpythonfor Financial Data coming from SQLDeveloperbased on the requirements specified.
- Implemented severalJava/J2EE design patterns like Spring MVC, Singleton, Spring Dependency Injection and Data Transfer Object.
- Used JAX-WS (SOAP) for producing web services and involved in writing programs to consume the web services using SOA with CXF framework and developed few web pages using JSP, JSTL, HTML, CSS,Javascript, Ajax and JSON.
- Implemented business logic, data exchange, XML processing and created graphics usingPython and Django.
- Wrote code to fetch data from Web services using JQUERY AJAX via JSON response and updating the HTML pages and developed high traffic web applications using HTML, CSS, and JavaScript, jQuery, Bootstrap, ExtJS, AngularJS, Node.js and react.js.
- Write SQL queries and create PL/SQL functions/procedures/packages that are optimized forAPEX and improve performance and response times ofAPEXpages and reports
- Used JQuery library, NodeJS and AngularJS for creation of powerful dynamic WebPages and web applications by using its advanced and cross browser functionality.
- UsedJavaServer Pages for content layout and presentation with Python and Extracted and loaded data usingPythonscripts and PL/SQL packages
- Worked with various frameworks of JavaScript like BackboneJS, AngularJS, and EmberJS etc.
- Written with object-orientedPython, Flask, SQL, Beautiful Soup, httplib2, Jinja2, HTML/CSS, Bootstrap, jQuery, Linux, Sublime Text, GIT.
- Developed GUI using JSP, Struts, HTML3, CSS3, XHTML, JQuery, Swing and JavaScript to simplify the complexities of the application.
- Wrote and executed various MYSQL database queries frompythonusingPython-MySQL connector and MySQLdb package and generatedPythonDjango forms to record data of online users and used PyTest for writing test cases.
- Developed and coordinated complex high quality solutions to clients using J2SE, J2EE, Servlets, JSP, HTML, Struts, Spring MVC, SOAP, JavaScript, JQuery, JSON and XML.
- Exposed business functionality to external systems (Interoperable clients) usingWeb Services (WSDL-SOAP)ApacheAxis.
- Used Hibernate, object/relational-mapping (ORM) solution, technique of mapping data representation from MVC model to Oracle10g Relational data model with a SQL-based schema and mapped using Hibernate Annotations.
- Skilled in using collections inPythonfor manipulating and looping through different user defined objects.
- Designed and developed intranet web applications using ExtJS, React.js, JavaScript and CSS and developed Merge jobs inPythonto extract and load data into MySQL and Mango Db database.
- Worked on Oracle & SQL Server as the backend databases and integrated withHibernateto retrieve Data Access Objects.
- DevelopedPythonbatch processors to consume and produce various feeds and developed entire frontend and backend modules usingPythonon Django Web Framework and developed Business Logic usingPythonon Django Web Framework.
- Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy and wroteApextriggers,apexclasses, developing Visual force pages, batch classes
- Involved in AJAX driven application by invoking web services/API and parsing the JSON response and involved in writing application level code to interact with APIs, Web Services using JSON.
Environment: Python, Django, Java, JSF MVC, Spring IOC, APEX, Ruby on Rails, Spring JDBC, Hibernate, ActiveMQ, Log4j, Ant, MySQL, JDK 1.6,J2EE, JSP, Servlets, HTML, LDAP, Salesforce, ESB Mule, JDBC, MongoDB, DAO, EJB3.0, PL/SQL, react.js, Web Sphere, Eclipse, Angular.JS, and CVS.
Confidential
Sr. Java Developer
Responsibilities:
- Used Spring MVC-Easy REST-Based JSON Service for development and developed server side components using Spring MVC framework.
- Developed user interfaces using Java Server Pages using HTML, DHTML, XHTML, AJAX,CSS & JavaScript, JSP.
- Configured Spring JDBC for database management and responsible in testing the classes and methods using JUnit test case.
- Developed Graphical User Interfaces using HTML, XML/XSLT and JSP's for user interaction and CSS for styling.
- Updated pages using HTML, CSS in Angular.js framework and development of SOAP (JAX-WS) web service applications using contract last approach.
- Extensively developed stored procedures, triggers, functions and packages in Oracle SQL, PL/SQL.
- Designed standalone application using Scene Builder, JavaFX, and CSS.
- Wrote Hibernate configuration file, Hibernate mapping files and defined persistence classes to persist the data into Oracle Database.
- Configured Hibernate session factory to integrate Hibernate with spring and employed SpringJDBC to implement batch jobs to pull organization structure related data.
- Developed JavaBeans for the Forms and Action classes for Struts framework.
- Generated reports based on complex SQL queries with in Perl scripts for use by the marketing and sales dept.
- Used Eclipse as an IDE for developing the applications and development of SOAP based web services using Apache, Spring and Hibernate
- Design, development and integration of REST based WebServices into AutoQuote application.
- Used J2EE design patterns namely Factory, MVC, Facade, DAO, and Singleton etc.
- Used JDBC to retrieve data from Oracle database and developed build scripts using Ant.
- Build components scheduling and configuration using Maven2
Environment: C#, OOAD, Java 1.6, J2EE, HTML, XHTML, CSS, Angular, JavaScript, AJAX, JQuery, Spring 3.0, Maven2, JPA, JSP, JAX-WS, SOAP UI, SVN, JBOSS, Spring MVC, JUnit 4, Oracle, PL/SQL.