Big Data Consultant Resume MINNEAPOLIS, MN - Hire IT People

SUMMARY

Results - driven analytics professional with excellent technology background and strong business acumen coupled with 4+ years of working with Big Data technology stack and 7+ years of overall experience in business modeling, requirements gathering and management through all phases of SDLC.
Demonstrable experience in Big Data space and challenges in multiple verticals (Finance and Healthcare) through a strategic roadmap and thorough data understanding for data collection, transformation, warehousing and analysis.
Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
Experience in managing and reviewing Hadoop log files.
Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
Extending Hive and Pig core functionality by writing customUDFs.
Experience in installation, configuration, supporting and managing - Cloudera's Hadoop platform along with CDH3&4 clusters.
Involved in Installing and configuring Kerberos for the authentication of users and hadoop daemons.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Experience in developing test plan, cases, and procedures.
Able to recognize test cases deficiencies and propose improvements.
Extensively worked on database applications using DB2 UDB, Oracle, SQL*Plus, PL/SQL, SQL*Loader.
Experienced in gathering business requirements, Market Research, Testing Strategies, Business Documentation, Cross-Functional Team Collaboration, SDLC, and Big Data Analytics.
Understands the components of running a fiscally successful project.

TECHNICAL SKILLS

Data Collection: Hadoop, HDFS, Sqoop, Flume, Kafka, Hadoop Streaming

Data Transformation/Analysis: Hive, Pig, R

Data Warehousing: NoSQL (Hbase, Cassandra), SQL, T-SQL

Cluster Management and Security: Zookeeper, Oozie, Cloudera Manager (CDH), Ambari, HDP2.2 - 2.4, Kerberos, Nagios, NMON, and Ganglia

Data Visualization: Tableau

Data Formats: XML, Xcel, Flat files, JSON.

Healthcare: HIPAA, HL7, ICD 9/10, Health Information Exchange, Medical Device Software methodology (IEC 62304), Claims Processing System, HL7, EHR/EMR

Business Analysis: Requirements elicitation and gathering, Technical documentation, Market Research, Testing Strategies, Business Documentation, Cross-Functional Team Collaboration, SDLC, Technology Roadmap creation, JAD Sessions, Gap Analysis, UML techniques and diagrams using IBM Rational Suite.

Interests: Real-time analytics through massive large data processing, IOT, R, Data-driven decision making leveraging business intelligence and Statistics.

Project Management Methodologies: UML, SDLC, RUP, AGILE/SCRUM, WATERFALL

Big Data Technologies: HDFS, Map Reduce, PIG, Hive, HBase, Zookeeper, Sqoop, Flume, Storm Hadoop Streaming, CDH, HDP 2.2, Oozie, Kafka

Scripting Languages: Python, R

Design Architecture / OS: Windows, Unix

Reporting /Analytics: SQL Reporting Services (SSRS), Tableau, Teradata SQL Assistant, Teradata Studio Express

Languages: C, C++, Java, SQL, HTML, Perl, XML

Version Control/Bug Control: Git, CVS, Perforce, Jenkins, JIRA, Clear Quest, Mantis

PROFESSIONAL EXPERIENCE

Confidential, MINNEAPOLIS, MN

Big Data Consultant

Responsibilities:

Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
Developed PIG Latin scripts to extract the data from the local file system output files to load into HDFS
Importing and exporting data into HDFS and Hive using Sqoop upto 500GB a day.
Collaborated with decision scientist’s team to execute a successful Proof of concept for Apache SOLR search engine to perform indexed fast retrieval of persisted data close to 200TB.
Worked in seamless transitioning from batch processing to real-time kafka queue based ingestion process.
Responsible for cluster detail layout and performance optimization.
Monitored multiple hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
Involved in Hadoop cluster tasks of commissioning and decommissioning nodes without any effect to running jobs and data escalating the cluster size to 250 nodes with HA namenode.
Orchestrated strategic processes for setting up data retention and archival policies for compliance and analytical needs.
Designed a process in Big Data to comply with Compliance and regulatory data integrity needs.
Installation/Configuration/Maintenance of Hadoop clusters for application development using Hadoop tools like Hive, Pig, Hbase, Sqoop and Flume
Worked on Migration from Hadoop 2.2 to Hadoop 2.4 via HDP versions
Data capacity planning and nodes forecasting.
Defining the big data platform architecture and Data strategy for ramping up the data ingestion needs without impacting continuous data flow.
Created SIT test plans and conducted functional/UAT testing in end-to-end lower environments leveraging industry standard process (PDDT) for using PROD data.
Shared responsibility for administration of Hadoop, Hive and Pig.
Worked in Data Lake Architecture and converted vast amount of raw data into required data formats and stored into HDFS.
Performed operations like Delta processing; Validations & Transformations on hive tables and HDFS.
Usage of Sqoop to import data into HDFS from SQL database
Load and transform large sets of structured, semi structured using Hive and run analytics on them.
Written impala queries for managing the data.

Confidential, NORTH CHICAGO, IL

Hadoop Administrator

Responsibilities:

Responsible for the installation, configuration, maintenance and troubleshooting of Hadoop Cluster. Duties included monitoring cluster performance using various tools to ensure the availability, integrity and confidentiality of application and equipment.
Deployed the Hadoop cluster in cloud environment with scalable nodes as per the business requirement.
Create, view and publish standard and custom reports.
View reports according to data access permissions.
Maintain and see historical trends for defined periods.
Supporting Multi-tenancy.
Addressed integration issues with other components in analytics engine.
Acceptance testing of all the different applications.
Experience in commissioning and decommissioning nodes of Hadoop cluster.
Imported the data from relational databases into HDFS using Sqoop.
Involved in creating Hive tables, loading data, and writing Hive queries.
Involved in upgrading Hadoop cluster from current version to minor version upgrade as well as to major versions.
Developed PIG and HIVE scripting for data processing on HDFS.
Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
Supported technical team members for automation, installation and configuration tasks.
Participated in evaluation and selection of new technologies to support system efficiency.
Work with the data center planning groups, assisting with network capacity and high availability requirements.

Confidential, ST PAUL, MN

Hadoop Administrator

Responsibilities:

Build a production and development Hadoop cluster using Apache Hadoop 2.0
Configure the cluster properties to gain the high cluster performance by taking cluster hardware configuration as key criteria.
Implemented HA services on the Hadoop clusters to make Hadoop services highly available.
Implemented the resource monitoring tool Ganglia on the Hadoop clusters.
Deployed the Hadoop cluster using Kerberos to provide secure access to the cluster.
Responsible for importing the data (mostly log files) from various sources into HDFS using Flume.
Installed and configure Zookeeper service for coordinating configuration-related information of all the nodes in the cluster to manage it efficiently.
Used PIG and HIVE to generate BI reports.
Used Jenkins for continuous integration with version control and automating jobs.
Configuring, installing, managing and administrating HBase clusters.
Dump the data from SQL Server to HDFS using Sqoop.
Maintain extensive documentation on Hadoop cluster, policies, configurations and job flows.
Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
Importing and exporting data into HDFS using Sqoop.
Experienced in defining job flows with Oozie.
Loading logs data directly into HDFS using Flume.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
Worked on NoSQL databases including HBase and Cassandra.
Created Hive External tables and loaded the data in to tables and query data using HQL.
Wrote shell scripts for rolling day-to-day processes and it is automated.
Automated workflows using shell scripts pull data from various databases into Hadoop
Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.

Confidential, INDIANAPOLIS, IN

Hadoop Administrator

Responsibilities:

Specifying the Cluster size, allocating Resource pool and monitoring of jobs.
Configured the Hive set up.
Export the result set from one SQL server to another MySQL using Sqoop.
Helped in the HIVE queries for the analysts.
Helped the team to increase Cluster from 25 Nodes to 40 Nodes.
Maintain System integrity of all sub-components across the multiple nodes in the cluster.
Monitor Cluster health and clean up logs when required.
Perform upgrades and configuration changes.
Upgrading the Hadoop Cluster from CDH3 to CDH4 and setup High availability Cluster
Integrate the HIVE with existing applications
Commission/decommission Nodes as needed.
Manage resources in a multi-tenancy environment.
Configured the Zookeeper in setting up the HA Cluster
Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
Set up the compression for different volumes in the cluster.
Configured Ganglia to send the required job metrics for chargeback.

Confidential, NY

System Administrator

Responsibilities:

Migrating and daily data-backup in a network computing environment along with documentation of recycled data log reports in excel format.
Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
Identifying operational needs of various departments and developing customized software to enhance System's productivity.
Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
Accomplished System/e-mail authentication using LDAP enterprise Database.
Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend, HTML and PHP scripting for Database access via web browsers.
Monitoring System Metrics and logs for any problems.

We provide IT Staff Augmentation Services!

Big Data Consultant Resume

Minneapolis, MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship