We provide IT Staff Augmentation Services!

Big Data Consultant Resume

0/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY

  • Results - driven analytics professional with excellent technology background and strong business acumen coupled with 4+ years of working with Big Data technology stack and 7+ years of overall experience in business modeling, requirements gathering and management through all phases of SDLC.
  • Demonstrable experience in Big Data space and challenges in multiple verticals (Finance and Healthcare) through a strategic roadmap and thorough data understanding for data collection, transformation, warehousing and analysis.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Extending Hive and Pig core functionality by writing customUDFs.
  • Experience in installation, configuration, supporting and managing - Cloudera's Hadoop platform along with CDH3&4 clusters.
  • Involved in Installing and configuring Kerberos for the authentication of users and hadoop daemons.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Experience in developing test plan, cases, and procedures.
  • Able to recognize test cases deficiencies and propose improvements.
  • Extensively worked on database applications using DB2 UDB, Oracle, SQL*Plus, PL/SQL, SQL*Loader.
  • Experienced in gathering business requirements, Market Research, Testing Strategies, Business Documentation, Cross-Functional Team Collaboration, SDLC, and Big Data Analytics.
  • Understands the components of running a fiscally successful project.

TECHNICAL SKILLS

Data Collection: Hadoop, HDFS, Sqoop, Flume, Kafka, Hadoop Streaming

Data Transformation/Analysis: Hive, Pig, R

Data Warehousing: NoSQL (Hbase, Cassandra), SQL, T-SQL

Cluster Management and Security: Zookeeper, Oozie, Cloudera Manager (CDH), Ambari, HDP2.2 - 2.4, Kerberos, Nagios, NMON, and Ganglia

Data Visualization: Tableau

Data Formats: XML, Xcel, Flat files, JSON.

Healthcare: HIPAA, HL7, ICD 9/10, Health Information Exchange, Medical Device Software methodology (IEC 62304), Claims Processing System, HL7, EHR/EMR

Business Analysis: Requirements elicitation and gathering, Technical documentation, Market Research, Testing Strategies, Business Documentation, Cross-Functional Team Collaboration, SDLC, Technology Roadmap creation, JAD Sessions, Gap Analysis, UML techniques and diagrams using IBM Rational Suite.

Interests: Real-time analytics through massive large data processing, IOT, R, Data-driven decision making leveraging business intelligence and Statistics.

Project Management Methodologies: UML, SDLC, RUP, AGILE/SCRUM, WATERFALL

Big Data Technologies: HDFS, Map Reduce, PIG, Hive, HBase, Zookeeper, Sqoop, Flume, Storm Hadoop Streaming, CDH, HDP 2.2, Oozie, Kafka

Scripting Languages: Python, R

Design Architecture / OS: Windows, Unix

Reporting /Analytics: SQL Reporting Services (SSRS), Tableau, Teradata SQL Assistant, Teradata Studio Express

Languages: C, C++, Java, SQL, HTML, Perl, XML

Version Control/Bug Control: Git, CVS, Perforce, Jenkins, JIRA, Clear Quest, Mantis

PROFESSIONAL EXPERIENCE

Confidential, MINNEAPOLIS, MN

Big Data Consultant

Responsibilities:

  • Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Developed PIG Latin scripts to extract the data from the local file system output files to load into HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop upto 500GB a day.
  • Collaborated with decision scientist’s team to execute a successful Proof of concept for Apache SOLR search engine to perform indexed fast retrieval of persisted data close to 200TB.
  • Worked in seamless transitioning from batch processing to real-time kafka queue based ingestion process.
  • Responsible for cluster detail layout and performance optimization.
  • Monitored multiple hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in Hadoop cluster tasks of commissioning and decommissioning nodes without any effect to running jobs and data escalating the cluster size to 250 nodes with HA namenode.
  • Orchestrated strategic processes for setting up data retention and archival policies for compliance and analytical needs.
  • Designed a process in Big Data to comply with Compliance and regulatory data integrity needs.
  • Installation/Configuration/Maintenance of Hadoop clusters for application development using Hadoop tools like Hive, Pig, Hbase, Sqoop and Flume
  • Worked on Migration from Hadoop 2.2 to Hadoop 2.4 via HDP versions
  • Data capacity planning and nodes forecasting.
  • Defining the big data platform architecture and Data strategy for ramping up the data ingestion needs without impacting continuous data flow.
  • Created SIT test plans and conducted functional/UAT testing in end-to-end lower environments leveraging industry standard process (PDDT) for using PROD data.
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Worked in Data Lake Architecture and converted vast amount of raw data into required data formats and stored into HDFS.
  • Performed operations like Delta processing; Validations & Transformations on hive tables and HDFS.
  • Usage of Sqoop to import data into HDFS from SQL database
  • Load and transform large sets of structured, semi structured using Hive and run analytics on them.
  • Written impala queries for managing the data.

Confidential, NORTH CHICAGO, IL

Hadoop Administrator

Responsibilities:

  • Responsible for the installation, configuration, maintenance and troubleshooting of Hadoop Cluster. Duties included monitoring cluster performance using various tools to ensure the availability, integrity and confidentiality of application and equipment.
  • Deployed the Hadoop cluster in cloud environment with scalable nodes as per the business requirement.
  • Create, view and publish standard and custom reports.
  • View reports according to data access permissions.
  • Maintain and see historical trends for defined periods.
  • Supporting Multi-tenancy.
  • Addressed integration issues with other components in analytics engine.
  • Acceptance testing of all the different applications.
  • Experience in commissioning and decommissioning nodes of Hadoop cluster.
  • Imported the data from relational databases into HDFS using Sqoop.
  • Involved in creating Hive tables, loading data, and writing Hive queries.
  • Involved in upgrading Hadoop cluster from current version to minor version upgrade as well as to major versions.
  • Developed PIG and HIVE scripting for data processing on HDFS.
  • Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
  • Supported technical team members for automation, installation and configuration tasks.
  • Participated in evaluation and selection of new technologies to support system efficiency.
  • Work with the data center planning groups, assisting with network capacity and high availability requirements.

Confidential, ST PAUL, MN

Hadoop Administrator

Responsibilities:

  • Build a production and development Hadoop cluster using Apache Hadoop 2.0
  • Configure the cluster properties to gain the high cluster performance by taking cluster hardware configuration as key criteria.
  • Implemented HA services on the Hadoop clusters to make Hadoop services highly available.
  • Implemented the resource monitoring tool Ganglia on the Hadoop clusters.
  • Deployed the Hadoop cluster using Kerberos to provide secure access to the cluster.
  • Responsible for importing the data (mostly log files) from various sources into HDFS using Flume.
  • Installed and configure Zookeeper service for coordinating configuration-related information of all the nodes in the cluster to manage it efficiently.
  • Used PIG and HIVE to generate BI reports.
  • Used Jenkins for continuous integration with version control and automating jobs.
  • Configuring, installing, managing and administrating HBase clusters.
  • Dump the data from SQL Server to HDFS using Sqoop.
  • Maintain extensive documentation on Hadoop cluster, policies, configurations and job flows.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
  • Importing and exporting data into HDFS using Sqoop.
  • Experienced in defining job flows with Oozie.
  • Loading logs data directly into HDFS using Flume.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on NoSQL databases including HBase and Cassandra.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Automated workflows using shell scripts pull data from various databases into Hadoop
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.

Confidential, INDIANAPOLIS, IN

Hadoop Administrator

Responsibilities:

  • Specifying the Cluster size, allocating Resource pool and monitoring of jobs.
  • Configured the Hive set up.
  • Export the result set from one SQL server to another MySQL using Sqoop.
  • Helped in the HIVE queries for the analysts.
  • Helped the team to increase Cluster from 25 Nodes to 40 Nodes.
  • Maintain System integrity of all sub-components across the multiple nodes in the cluster.
  • Monitor Cluster health and clean up logs when required.
  • Perform upgrades and configuration changes.
  • Upgrading the Hadoop Cluster from CDH3 to CDH4 and setup High availability Cluster
  • Integrate the HIVE with existing applications
  • Commission/decommission Nodes as needed.
  • Manage resources in a multi-tenancy environment.
  • Configured the Zookeeper in setting up the HA Cluster
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Set up the compression for different volumes in the cluster.
  • Configured Ganglia to send the required job metrics for chargeback.

Confidential, NY

System Administrator

Responsibilities:

  • Migrating and daily data-backup in a network computing environment along with documentation of recycled data log reports in excel format.
  • Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
  • Identifying operational needs of various departments and developing customized software to enhance System's productivity.
  • Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
  • Accomplished System/e-mail authentication using LDAP enterprise Database.
  • Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend, HTML and PHP scripting for Database access via web browsers.
  • Monitoring System Metrics and logs for any problems.

We'd love your feedback!