Hadoop Consultant Resume Chicago, IL - Hire IT People

SUMMARY:

Over 7+ years of professional IT experience in analysis, design, and development using Hadoop, Java J2EE and SQL.
5+ years of experience in developing large scale applications using Hadoop and Other Big data tools.
Created a complete processing engine based on Cloudera distribution.
Hands - on experience with Production Hadoop applications such as administration, configuration management, debugging and performance tuning.
Experience in developing solutions to analyze large data security efficiently with Kerberos.
Experience with new Hadoop 2.0 architecture YARN (MRV2) and developing YARN Applications on it.
Excellent Knowledge on Hadoop architecture as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Excellent hands on with importing and exporting data from different Relational Database Systems like MySQL and Oracle into HDFS and Hive using Sqoop.
Knowledge in Kafka installation & integrational with Spark Streaming.
Hands-on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.
Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
Experience in converting MapReduce applications to Spark.
Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
Good knowledge in using job scheduling and workflow designing tools like Oozie.
Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera Manager, Hortonworks, Presto and Ambari.
Have good experience creating real time data streaming solutions using Spark/Storm, Kafka and Flume.
Very good understanding on NOSQL databases like MongoDB and HBase.
Extensive experience in creating Class Diagrams, Activity Diagrams, Sequence Diagrams using Unified Modeling Language(UML).
Extending Hive and Pig core functionality by writing custom UDFs.
Installed and configured Hadoop, MapReduce, HDFS developed multiple MapReduce jobs in java for data cleaning and Up gradation Cloudera from 5.5 to 6.0 version.
Good understanding of Data Mining and Machine Learning techniques.
Experience in handling messaging services using Apache Kafka.
Experiences in fine-tuning Map reduce jobs for better scalability and performance.
Created custom python/shell scripts to import data via SQOOP from various SQL databases such as Teradata, SQL Server, and Oracle.
Experience on NoSQL Databases such as HBase and Cassandra.
Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as Oracle, SQL Server, MySQL
Working experience in Development, Production and QA Environments.
Experienced in SDLC, Agile (SCRUM) Methodology, And Iterative Waterfall.
Well experienced in building servers like DHCP, PXE with kickstart, DNS and NFS and used them in building infrastructure in Linux Environment and working with Puppet for application deployment.
Experienced in Linux Administration tasks like IP Management (IP Addressing, Sub netting, Ethernet Bonding, and Static IP).
Good communication and interpersonal skills, a committed team player and a quick learner.

TECHNICAL SKILLS:

Big Data Technologies: Apache Hadoop, Map-Reduce, Cloudera 4.3.2, HDFS, Cloudera Impala, Hortonworks, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, Hue, Presto, Ranger, Zeppelin, OOZIE, Kerberos.

Languages: Core Java, J2EE, SQL, PL/SQL, Unix Shell Scripting, Perl, Python, Shell.

Web Technologies: JSP, EJB 2.0, JNDI, JMS, JDBC, HTML, JavaScript

Web/Application servers: Tomcat, JBoss 5.1.0

Databases: Oracle 11G/10G, SQL Server, DB2, Sybase, Teradata

Frame Works: Hadoop, MapReduce, MVC, Struts 2.x/1.x

IDE: IntelliJ IDEA 7.2, EditPlus3, Eclipse3.5, NetBeans6.5, TOAD, PL/SQL, Teradata

Version Control: VSS Visual Source Safe, Subversion, CVS

Testing Technologies: JUnit 4/3.8

Office Packages: MS-Office 2010, 2007, 2003 and Vision

Operating Systems: MS-DOS, Windows XP, Windows 7, UNIX and Linux

PROFESSIONAL EXPERIENCE:

Hadoop Consultant

Confidential, Chicago, IL

Responsibilities:

Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
Good understanding on Architecture, Configurations and Security of Cassandra with Falcon. Data read path and write path for Cassandra.
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
Developed simple and complex MapReduce programs in Java for Data Analysis.
Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
Monitoring workload, job performance and capacity planning using Cloudera Manager.
Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, YARN, and zookeeper. Strong knowledge of hive's analytical functions.
Written Flume configuration files to store streaming data in HDFS.
Upgraded Kafka 0.8.2.2 to 0.9.0.0
As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
Do analytics using map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
Used Restful Web Services API to connect with the MapR table. Involved to create connection to Database was developed through restful web services API.
Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
Installation and configuration of Openstack Juno on RedHat6 with multiple Compute nodes.
Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
Used Jira for project tracking, Bug tracking and Project Management. Load data from various data sources into HDFS using Kafka.
Used Python scripts to update content in the database and manipulate files.
Generated Python Django Forms to record data of online users.
Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
Involved in Requirement analysis, Design, Development, Data Design and Mapping, extraction, validation and creating complex business requirements.

Environment: Cloudera 4.3.2, HDFS, CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, Hue, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, Jira, Zeppelin, Web Logic 8.1 Kafka, Ranger, Yarn, Falcon, Kerberos, Impala, Pig, Python Scripting, MySQL,Perl, Red Hat Linux, CentOS and other UNIX utilities, Cloudera Manager.

Hadoop Administrator

Confidential, Minneapolis, MN

Responsibilities:

Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
Installed and Configured Sqoop to import and export the data into MapR-FS, HBase and Hive from Relational databases.
Administering large MapR Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
Installed and Configured MapR-zookeeper, MapR-cldb, MapP-job tracker, MapR-task tracker, MapR-resource manager, MapR-node manager, MapR-fileserver, and MapR-webserver.
Worked independently with Cloudera support for any issue/concerns with Hadoop cluster.
Point of Contact for Vendor escalation Cloudera Manager Up gradation from 5.3 to 5.5 versions.
Installed and configured Knox gateway to secure HIVE through ODBC, WebHcat and Oozie services.
Load data from relational databases into MapR-FS filesystem and HBase using Sqoop.
Setting up MapR metrics with NoSQL database to log metrics data.
Close monitoring and analysis of the MapReduce job executions on cluster at task level.
Optimized Hadoop clusters components to achieve high performance.
Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
Integrated HDP clusters with Active Directory and enabled Kerberos for Authentication.
Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning.
Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
Worked on creating the Data Model for HBase from the current Oracle Data model.
Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
Leveraged Chef to manage and maintain builds in various environments.
Planned for hardware and software installation on production cluster and communicated with multiple teams to get it done.
Performed troubleshooting, fixed and deployed many Python bug fixes of the applications and involved in fine tuning of existing processes followed advance patterns and methodologies.
Monitoring the Hadoop cluster functioning through MCS.
Worked on NoSQL databases including HBase.
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Worked with Linux server admin team in administering the server hardware and operating system.
Worked closely with data analysts to construct creative solutions for their analysis tasks.
Managed and reviewed Hadoop and HBase log files.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Automated workflows using shell scripts pull data from various databases into Hadoop.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Environment: Hadoop 1.2.1, Map Reduce, Hive 0.10.0, Perl, Pig 0.11.1, Kerberos, Ranger, Presto, Hue, Oozie 3.3.0, H base 0.94.11, Sqoop1.4.4, Flume 1.4.0, Zeppelin, Perl Java, Python, SQL, PL/SQL, Oracle 10g, Eclipse

Hadoop Administrator/ Tester

Confidential

Responsibilities:

Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari and manually from command line.
Cluster maintenance, Monitoring, commissioning and decommissioning of data nodes, troubleshooting, manage and review log files.
Actively involved in installation performance tuning, patching, regular backups, user account administration, upgrades and documentation.
Installation of new components and removal of them through Ambari.
Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency.
Used Cloudera Navigator for data governance: Audit and Linage.
Periodically reviewed Hadoop related logs and fixed errors.
Commissioned new cluster nodes for increased capacity and decommissioned servers with hardware problems.
Responsible for adding new eco system components, like storm, flume, knox with required custom configurations based on the requirements and Hadoop daemons.
Developed Python, Shell Scripts and Power shell for automation purpose.
Implemented Kerberos Security Authentication protocol for existing cluster.
Worked with Ranger, Knox configuration to provide centralized security to Hadoop services.
Created independent libraries in Python which can be used by multiple projects which have common functionalities.
Hands on experience with NoSQL databases like Hbase, Cassandra and MongoDB.
Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of databases.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
Performing Linux systems administration on production and development servers (Red Hat Linux, CentOS and other UNIX utilities).

Environment: Hadoop HDFS, MapReduce, Hortonworks, Falcon, Cloudera, Ambari, Ranger, Knox, Puppet, Hive, Pig, Kafka, Oozie, Sqoop, Shell, Python, MongoDB, Apache HBase.

Hadoop Admin

Confidential

Responsibilities:

Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
Installation and Configuration of Hadoop Cluster.
Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly. The plugin also provided data locality for Hadoop across host nodes and virtual machines.
Developed map Reduce jobs to analyze data and provide heuristics reports.
Adding, Decommissioning and rebalancing nodes.
Created POC to store Server Log data into Cassandra to identify System Alert Metrics.
Rack Aware Configuration.
Configuring Client Machines.
Configuring, Monitoring and Management Tools.
HDFS Support and Maintenance.
Cluster HA Setup.
Applying Patches and Perform Version Upgrades.
Incident Management, Problem Management and Change Management.
Performance Management and Reporting.
Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow.
Developed and designed automation framework using Python and Shell scripting.
Recover from Name Node failures.
Schedule Map Reduce Jobs -FIFO and FAIR share.
Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop.
Integration with RDBMS using swoop and JDBC Connectors.
Working with Dev Team to tune Job Knowledge of Writing Hive Jobs.

Environment: HDP 2.2, Python, HBase, Kafka, HDFS, Yarn, Hortonworks, MongoDB, Hive, Oozie, Pig, Sqoop, Shell Scripting, Python, MySQL, RHEL, CentOS, Ambari.

Linux Administrator

Confidential

Responsibilities:

Configuring and tuning system and network parameters for optimum performance.
Gained knowledge on troubleshooting and problem solving skills, including application and network-level troubleshooting ability.
Gained knowledge and experience on writing shell scripts to automate the tasks.
Identifying and triaging outages monitor and remediate systems and network performance.
Installing, Upgrading and Managing Hadoop Cluster on Hortonworks.
Developing tools to automate the deployment, administration, and monitoring of a large-scale Linux environment.
Performing server tuning, operating system upgrades.
Participating in the planning phase for system requirements on various projects for deployment of business functions.
Participating in 24x7 on-call rotation and maintenance windows.
Communication & coordination with internal / external groups and operations.

Environment: Windows 2008/2007 server, Unix Shell Scripting, SQL Manager Studio, Red Hat Linux, Microsoft SQL Server 2000/2005/2008, MS Access, Hortonworks, NoSQL, Linux/Unix, Putty Connection Manager, Putty, SSH.

We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

Chicago, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship