Sr Big Data Consultant Resume
5.00/5 (Submit Your Rating)
SUMMARY
- 9 years IT experience, which includes experience in requirement gathering, design and develop applications for Hadoop and Data Warehousing solutions
- Over 4 years of experience in dealing with Apache Hadoop components like Hive, YARN, MR, HDFS, Sqoop, Pig, Hue, Oozie, Hbase Flume, Solr, Kafka, Impala, Spark and Zookeeper
- Hadoop Cluster environment administration using Ambari and Cloudera Manager and performance tuning, cluster Monitoring, configuration management, resource management & Troubleshooting.
- Hadoop Cluster designing which includes space allocation, user and group creation & adding new data nodes and installing and configuring Hadoop and ecosystem tools.
- Experience in migrating data and structure from IBM DB2, Oracle, Teradata to Hadoop
- Worked on requirement analysis, configuring, troubleshooting issues Hadoop ecosystem tools Hive, Sqoop, Pig, Hue, Oozie, Flume, Solr, Kafka, Impala, Spark and Zookeeper.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, MRV1 and MRV2, Name Node, Data Node and Map Reduce programming paradigm.
- Data Analytics using Hive, Pig, Python, Impala and Spark.
- Have very good Data Analysis and Data Validation skills and good exposure to the entire Software Development Lifecycle (SDLC).
- Expertise in writing MapReduce programs using JAVA.
- Experience in Implementing security on hiveserver2 using Apache Sentry.
- Benchmarking hive table file formats: sequential, Avro, RFC and ORC to get optimal storage and query performance.
- Experience in creating supervised and unsupervised Machine Learning models for data analytics
- Created Hadoop cluster on AWS environment for application development.
- Experience in analyzing data using HiveQL, Pig Latin and custom MapReduce programs in Java. Extending Hive and Pig core functionality by writing custom UDFs.
- Expert in Building, Deploying and Maintaining the Applications.
- Have very good Data Modeling skills in understanding the data requirements and subsequently building the data model in Logical & Physical using Erwin and Power designer tools.
- Experience with RDBMS: IBM DB2, Oracle, MS Access and SQL Server.
- Experience in creating Tableau dashboards for reporting.
- Experience in designing programs using Python, Java, shell and Perl.
- 24/7 operational support to production servers and related infrastructure clusters.
- An excellent team player and self - starter with good communication skills and proven abilities to finish tasks before target deadlines.
TECHNICAL SKILLS
Programming Languages: Java, Perl, Python, PIG and PL/SQL.
Java Technologies: JDBC.
Databases: Oracle, MS Access, SQL Server, Teradata and NoSQL database (HBase)
IDE’s & Utilities: Eclipse and JCreator, NetBeans.
Web Dev. Technologies: HTML, XML.
Protocols: TCP/IP, HTTP and HTTPS.
Operating Systems: Linux, AIX, Solaris, MacOS and WINDOWS
Hadoop ecosystem: Hive, YARN, MR, HDFS, Sqoop, Pig, Hue, Oozie, Flume, Solr, Kafka, Impala, Spark and Zookeeper
PROFESSIONAL EXPERIENCE
Confidential
Sr Big Data Consultant
Responsibilities:
- Created Jobs for Insurance analytics platform for streaming real time data using Spark, Hive, Kafka and PIG to decide insurance rates for drivers based on driving pattern and behavior.
- Created application using Apache Solr for searching keywords in data file.
- Created application for streaming real-time data using Storm and integrated it with Hive.
- Analyzed tax reporting data and created approach for archiving old tax data from SQL Server to Hadoop (HDFS) for a retail client.
- Migrating data from relational databases into Hadoop(Hive).
- Designed data quality monitoring system for a Media and Entertainment client. Analyzed attributes having loan data and prepared data profile and data conversion document.
- Designed Risk Audit process for a Healthcare client and created Risk assessment database for performing Risk Assessment and Audits and used Tableau for visualization.
- Created and configured Hortonworks Hadoop cluster on AWS.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Writing Hive queries for data analysis to meet the business requirements.
- Requirement gathering from the Business Partners and understand the data requirements and prepare the requirement document and get agreement from the client that all the data requirements are complete and understood.
- Creating Hive tables and querying them using HiveQL.
- Oozie work flow is used to load data into hive table.
- Installed and configured Pig and also written PigLatin scripts.
- Created data ingestion tool for importing data into Hive(HDFS) and scheduled hive actions using Oozie workflows and created MapReduce jobs to analyze the data
- Worked on tuning the performance Hive queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Confidential
Hadoop Administrator
Responsibilities:
- Application DBA for a Financial services client responsible for managing Hadoop cluster of 1.5 Petabyte configured capacity for application development.
- Designed security model for a security audit project in Credit Card data warehouse, analyzed and generated report of users accessing confidential data. Designed approach to restrict access to users. Implemented new security model in data warehouse.
- Designed application for Fraud detection in credit card processing using Sqoop, Flume and Hive and worked with business to create fraud rules engine to identify fraudulent transactions.
- Worked with client on requirement analysis, designing workspace and troubleshooting issues in developing applications on Hadoop ecosystem tools Hive, Sqoop, Pig, Hue, Oozie, Flume, Kafka, Impala, Spark and Zookeeper.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Designed approach to migrate database objects and data from DB2, Oracle and Teradata to Hadoop (Hive).
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible to manage data coming from different sources.
- Supported MapReduce programs those are running on the cluster.
- Experienced in managing andreviewingHadooplog files.
- Designed project for migrating IVR (Interactive Voice Response) data from Oracle to hive using Sqoop and analyzing data and generating reports using Impala. Helped development team with performance tuning and debugging issues faced by the clients.
- Served as a team member for upgrading CDH (Cloudera Distribution including Apache Hadoop) to latest version and Implemented High Availability in Hadoop Cluster.
- Facilitated knowledge transfer sessions.
- Created script to generate space utilization report to be sent to higher management to monitor overall space used by different line of business.
- Facilitated knowledge transfer sessions.
- Created performance metrics for queries running in Hive using MR, Hive using Tez and Impala.
- Experience in creating Sqoop jobs for importing data from various sources.
- Created Flume jobs for importing log files in HDFS.
- Job management using Fair scheduler.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Created partitions and buckets to optimize hive query performance.
- Worked with business and designed architecture for databases and directory structure in Hadoop environment for different clients. Calculated space as part of space requirement analysis and allocated space quota based on replication factor.
- Created benchmarking of hive table file formats: sequential, Avro, RFC and ORC to get optimal storage and query performance.
- Implemented security using Apache Sentry on Hiveserver2.
Confidential
DB2 Application DBA
Responsibilities:
- Managed one of Confidential ’s biggest client data warehouse for application development end to end. The data warehousing system is in size 250+ TB’s running at present on db2 9.7 for AIX.
- Application DBA for Integrated Credit Card Decision Project, designed schema and helped development team with performance tuning and installing changes in production environment.
- Designed schemas and objects for managing data and generating reports in data warehouse for credit cards issued in ASIA region.
- Created Data Model for Credit Card Warehouse for enabling EMV chip card data.
- Responsible for database reorganization and updating statistics using utilities like export, import, load, reorg, runstats and db2look, db2move, ingest, backup and restore.
- Worked on Source retirement project to migrate applications from DB2 Mainframe to DB2 AIX.
- Created shell script to archive old historical data.
- Created Perl script for automate table creation process.
- Helped application team for fine tune SQLs to reduce runtime.
- Created shell script to monitor space consumed by tables created by various business users.
- Created security model for new schemas and tables.