Big Data Consultant Resume
Minneapolis, MN
SUMMARY
- Results - driven analytics professional with excellent technology background and strong business acumen coupled with 4+ years of working with Big Data technology stack and 7+ years of overall experience in business modeling, requirements gathering and management through all phases of SDLC.
- Demonstrable experience in Big Data space and challenges in multiple verticals (Finance and Healthcare) through a strategic roadmap and thorough data understanding for data collection, transformation, warehousing and analysis.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Extending Hive and Pig core functionality by writing customUDFs.
- Experience in installation, configuration, supporting and managing - Cloudera's Hadoop platform along with CDH3&4 clusters.
- Involved in Installing and configuring Kerberos for the authentication of users and hadoop daemons.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Experience in developing test plan, cases, and procedures.
- Able to recognize test cases deficiencies and propose improvements.
- Extensively worked on database applications using DB2 UDB, Oracle, SQL*Plus, PL/SQL, SQL*Loader.
- Experienced in gathering business requirements, Market Research, Testing Strategies, Business Documentation, Cross-Functional Team Collaboration, SDLC, and Big Data Analytics.
- Understands the components of running a fiscally successful project.
TECHNICAL SKILLS
Data Collection: Hadoop, HDFS, Sqoop, Flume, Kafka, Hadoop Streaming
Data Transformation/Analysis: Hive, Pig, R
Data Warehousing: NoSQL (Hbase, Cassandra), SQL, T-SQL
Cluster Management and Security: Zookeeper, Oozie, Cloudera Manager (CDH), Ambari, HDP2.2 - 2.4, Kerberos, Nagios, NMON, and Ganglia
Data Visualization: Tableau
Data Formats: XML, Xcel, Flat files, JSON.
Healthcare: HIPAA, HL7, ICD 9/10, Health Information Exchange, Medical Device Software methodology (IEC 62304), Claims Processing System, HL7, EHR/EMR
Business Analysis: Requirements elicitation and gathering, Technical documentation, Market Research, Testing Strategies, Business Documentation, Cross-Functional Team Collaboration, SDLC, Technology Roadmap creation, JAD Sessions, Gap Analysis, UML techniques and diagrams using IBM Rational Suite.
Interests: Real-time analytics through massive large data processing, IOT, R, Data-driven decision making leveraging business intelligence and Statistics.
Project Management Methodologies: UML, SDLC, RUP, AGILE/SCRUM, WATERFALL
Big Data Technologies: HDFS, Map Reduce, PIG, Hive, HBase, Zookeeper, Sqoop, Flume, Storm Hadoop Streaming, CDH, HDP 2.2, Oozie, Kafka
Scripting Languages: Python, R
Design Architecture / OS: Windows, Unix
Reporting /Analytics: SQL Reporting Services (SSRS), Tableau, Teradata SQL Assistant, Teradata Studio Express
Languages: C, C++, Java, SQL, HTML, Perl, XML
Version Control/Bug Control: Git, CVS, Perforce, Jenkins, JIRA, Clear Quest, Mantis
PROFESSIONAL EXPERIENCE
Confidential, MINNEAPOLIS, MN
Big Data Consultant
Responsibilities:
- Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Developed PIG Latin scripts to extract the data from the local file system output files to load into HDFS
- Importing and exporting data into HDFS and Hive using Sqoop upto 500GB a day.
- Collaborated with decision scientist’s team to execute a successful Proof of concept for Apache SOLR search engine to perform indexed fast retrieval of persisted data close to 200TB.
- Worked in seamless transitioning from batch processing to real-time kafka queue based ingestion process.
- Responsible for cluster detail layout and performance optimization.
- Monitored multiple hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Hadoop cluster tasks of commissioning and decommissioning nodes without any effect to running jobs and data escalating the cluster size to 250 nodes with HA namenode.
- Orchestrated strategic processes for setting up data retention and archival policies for compliance and analytical needs.
- Designed a process in Big Data to comply with Compliance and regulatory data integrity needs.
- Installation/Configuration/Maintenance of Hadoop clusters for application development using Hadoop tools like Hive, Pig, Hbase, Sqoop and Flume
- Worked on Migration from Hadoop 2.2 to Hadoop 2.4 via HDP versions
- Data capacity planning and nodes forecasting.
- Defining the big data platform architecture and Data strategy for ramping up the data ingestion needs without impacting continuous data flow.
- Created SIT test plans and conducted functional/UAT testing in end-to-end lower environments leveraging industry standard process (PDDT) for using PROD data.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Worked in Data Lake Architecture and converted vast amount of raw data into required data formats and stored into HDFS.
- Performed operations like Delta processing; Validations & Transformations on hive tables and HDFS.
- Usage of Sqoop to import data into HDFS from SQL database
- Load and transform large sets of structured, semi structured using Hive and run analytics on them.
- Written impala queries for managing the data.
Confidential, NORTH CHICAGO, IL
Hadoop Administrator
Responsibilities:
- Responsible for the installation, configuration, maintenance and troubleshooting of Hadoop Cluster. Duties included monitoring cluster performance using various tools to ensure the availability, integrity and confidentiality of application and equipment.
- Deployed the Hadoop cluster in cloud environment with scalable nodes as per the business requirement.
- Create, view and publish standard and custom reports.
- View reports according to data access permissions.
- Maintain and see historical trends for defined periods.
- Supporting Multi-tenancy.
- Addressed integration issues with other components in analytics engine.
- Acceptance testing of all the different applications.
- Experience in commissioning and decommissioning nodes of Hadoop cluster.
- Imported the data from relational databases into HDFS using Sqoop.
- Involved in creating Hive tables, loading data, and writing Hive queries.
- Involved in upgrading Hadoop cluster from current version to minor version upgrade as well as to major versions.
- Developed PIG and HIVE scripting for data processing on HDFS.
- Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
- Supported technical team members for automation, installation and configuration tasks.
- Participated in evaluation and selection of new technologies to support system efficiency.
- Work with the data center planning groups, assisting with network capacity and high availability requirements.
Confidential, ST PAUL, MN
Hadoop Administrator
Responsibilities:
- Build a production and development Hadoop cluster using Apache Hadoop 2.0
- Configure the cluster properties to gain the high cluster performance by taking cluster hardware configuration as key criteria.
- Implemented HA services on the Hadoop clusters to make Hadoop services highly available.
- Implemented the resource monitoring tool Ganglia on the Hadoop clusters.
- Deployed the Hadoop cluster using Kerberos to provide secure access to the cluster.
- Responsible for importing the data (mostly log files) from various sources into HDFS using Flume.
- Installed and configure Zookeeper service for coordinating configuration-related information of all the nodes in the cluster to manage it efficiently.
- Used PIG and HIVE to generate BI reports.
- Used Jenkins for continuous integration with version control and automating jobs.
- Configuring, installing, managing and administrating HBase clusters.
- Dump the data from SQL Server to HDFS using Sqoop.
- Maintain extensive documentation on Hadoop cluster, policies, configurations and job flows.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Importing and exporting data into HDFS using Sqoop.
- Experienced in defining job flows with Oozie.
- Loading logs data directly into HDFS using Flume.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on NoSQL databases including HBase and Cassandra.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Automated workflows using shell scripts pull data from various databases into Hadoop
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
Confidential, INDIANAPOLIS, IN
Hadoop Administrator
Responsibilities:
- Specifying the Cluster size, allocating Resource pool and monitoring of jobs.
- Configured the Hive set up.
- Export the result set from one SQL server to another MySQL using Sqoop.
- Helped in the HIVE queries for the analysts.
- Helped the team to increase Cluster from 25 Nodes to 40 Nodes.
- Maintain System integrity of all sub-components across the multiple nodes in the cluster.
- Monitor Cluster health and clean up logs when required.
- Perform upgrades and configuration changes.
- Upgrading the Hadoop Cluster from CDH3 to CDH4 and setup High availability Cluster
- Integrate the HIVE with existing applications
- Commission/decommission Nodes as needed.
- Manage resources in a multi-tenancy environment.
- Configured the Zookeeper in setting up the HA Cluster
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Set up the compression for different volumes in the cluster.
- Configured Ganglia to send the required job metrics for chargeback.
Confidential, NY
System Administrator
Responsibilities:
- Migrating and daily data-backup in a network computing environment along with documentation of recycled data log reports in excel format.
- Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
- Identifying operational needs of various departments and developing customized software to enhance System's productivity.
- Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
- Accomplished System/e-mail authentication using LDAP enterprise Database.
- Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend, HTML and PHP scripting for Database access via web browsers.
- Monitoring System Metrics and logs for any problems.
