- Around 7 years of experience in Big Data Technologies like Hadoop (HDFS & Map Reduce), PIG, HIVE, SQOOP, SPARK.
- Experience in working in environments using Agile development and Kanban Support methodologies.
- Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Flume, Pig, Hive, HBase, Oozie and Zookeeper).
- Hands on Experience on Data Ingestion tools like Apache Sqoop for importing and exporting data to Relational data base systems (RDBMS) and vice - versa.
- Excellent understanding of Hadoop architecture, Hadoop Distributed File System and various components such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker, YARN, Spark and MapReduce concepts.
- Strong knowledge on NoSQL databases like Cassandra and HBASE and Microsoft Azure for testing the applications.
- Experience in Cloudera distribution and Horton Works Distribution (HDP).
- Experience in managing the cluster resources by implementing fair and capacity scheduler.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Good Knowledge on Hive, HBase for processing the data.
- Familiarity with Teradata, Oracle and MySQL databases and Cassandra.
- Worked on different Source Systems like JSON, CSV, Parquet, Avro, Text files.
- Experience in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both.
- Knowledge in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Good understanding of HDFS Designs, Daemons and HDFS high availability (HA).
- Worked on different OS like UNIX /Linux and developed various shell scripts.
- Experience designing Oozie workflows to schedule and manage data flow.
- Versatile team player with good communication, analytical, presentation and inter-personal skills.
Bigdata Technologies: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Oozie, Zookeeper, Scala, Apache Spark
Hadoop Distributions: Cloudera, Hortonworks
Languages: Java, C, SQL, Python, PIG: Latin, Scala
IDE Tools: Eclipse, NetBeans
Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS
Application Servers: Tomcat, WebLogic
Databases: Oracle, Apache Cassandra and MySQL
Confidential, Bentonville, AR
Senior Hadoop Administrator/ Developer
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Adding/installation of new components and removal of them through Cloudera Manager.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes and recommended course of actions.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Using Flume and Spool directory loading the data from local system to hdfs
- Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
- Fine tuning hive jobs for optimized performance.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Extending the functionality of Hive and Pig with custom UDF s and UDAF’s.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Monitoring Solr transparency’s, stat’s, cle’s dashboard and review the solr servers.
- Creating and deploying a corresponding solrCloud collection.
- Experience with Solr integration with Hbase using Lily indexer/Key-Value Indexer.
- Creating and truncating hbase tables in hue and taking backup of submitterId(s).
- Configuring, Managing permissions for the users in hue.
- Responsible for building scalable distributed data solutions using Hadoop.
- Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
- Involved in loading data from LINUX file system to HDFS.
- Creating and managing the Cron jobs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Experience in configuring the Storm in loading the data from MYSQL to HBASE using jms
- Involved in loading data from UNIX file system to HDFS.
- Experience in managing and reviewing Hadoop log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hdfs, Mapreduce, Hive 1.1.0, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, Cdh5, Apache Hadoop 2.6, Spark, Solr, Redhat, Mysql And Oracle.
Confidential, Bellevue, WA
- Responsible for Cluster Maintenance, Monitoring, Managing, Commissioning and decommissioning Data nodes, Troubleshooting, and review data backups, Manage & review log files for Hortonworks.
- Adding/Installation of new components and removal of them through Cloudera.
- Monitoring workload, job performance, capacity planning using Cloudera.
- Major and Minor upgrades and patch updates.
- Installed Hadoop eco system components like Pig, Hive, HBase and Sqoop in a Cluster.
- Experience in setting up tools like Ganglia for monitoring Hadoop cluster.
- Handling the data movement between HDFS and different web sources using Flume and Sqoop.
- Extracted files from NoSQL database like HBase through Sqoop and placed in HDFS for processing.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Installed and configured HA of Hue to point Hadoop Cluster in cloudera Manager.
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting and managing Hadoop Clusters.
- Installed and configured MapReduce, HDFS and developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Working with applications teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible for developing data pipeline using HDInsight, Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded.
- Set up and managing HA Name Node to avoid single point of failures in large clusters.
- Working with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
Environment: Linux, Shell Scripting, Java (JDK 1.7), Tableau, Map Reduce, Teradata, SQL server, NoSQL, Cloudera, Flume, Sqoop, Chef, Puppet, Pig, Hive, Zookeeper and HBase
Confidential, Chicago, IL
Big Data Administrator
- Load the data into HDFS from different Data sources like Oracle, DB2 using Sqoop and load into Hive tables.
- Analyzed big data sets by running Hive queries and Pig scripts.
- Integrated the hive warehouse with HBase for information sharing among teams.
- Developed the Sqoop scripts for the interaction between Pig and MySQL Database.
- Worked on Static and Dynamic partitioning and Bucketing in Hive.
- Scripted complex Hive QL queries on Hive tables for analytical functions.
- Developed complex Hive UDFs to work with sequence files.
- Designed and developed Pig Latin scripts and Pig command line transformations for data joins and custom processing of Map Reduce outputs.
- Created dashboards in Tableau to create meaningful metrics for decision making.
- Performed rule checks on multiple file formats like XML, JSON, CSV and compressed file formats.
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Used storage format like AVRO to access multiple columnar data quickly in complex queries.
- Implemented Counters for diagnosing problem in queries and for quality control and application-level statistics.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Implemented Log4j to trace logs and to track information.
- Developed some helper class for abstracting Cassandra cluster connection act as core toolkit.
- Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: HDFS, MapReduce, Cassandra, Hive, Pig, Sqoop, Tableau, NoSQL, Shell Scripting, Maven, Git, HDP Distribution, Eclipse, Log4j, JUnit, Linux.
- Installed and configured Hadoop on a cluster and Hortonworks Distributed Systems.
- Developed multiple MapReduce Jobs in java for data cleaning and pre-processing.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map-reduce way.
- Worked in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume and defining job flows using Oozie.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources and application.
- Involved in Unit level and Integration level testing.
- Scheduling and managing cron jobs, wrote shell scripts to generate alerts.
- Prepared design documents and functional documents.
- Based on the requirements, addition of extra nodes to the cluster to make it scalable.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Involved in loading data from local file system (LINUX) to HDFS.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing.
- Submit a detailed report about the daily activities on a weekly basis.
- Worked on NoSQL Databases for data storage and process.
- Working with Talend Open Studio for ETL process.
Environment: MS SQL Server 2008, Hadoop-HDFS, Pig, Sqoop, Hive, Flume, Oozie, MySQL, MongoDB, Eclipse, JSP, JDBC, UNIX Shell Scripting, Talend.
- Responsible for development and enhancements of all the modules of raildocs and have them running quickly with online features.
- Developed Python batch processors to consume and produce various feeds.
- Developed internal auxiliary web apps using Python Flask framework with CSS / HTML framework.
- Used Python scripts to update content in the database and manipulate files.
- Generated Python Django Forms to record data of online users.
- Engineered stable, isolated environments per game team.
- Managed our servers in development, testing, and production.
- Using Subversion version control tool to coordinate team-development.
- Developed SQL Queries, Stored Procedures, and Triggers Using Oracle, SQL, PL/SQL.
- Used Linux profiler Valgrind for optimization of code.
- Implemented locking mechanisms using multi-threading functionality.
Environment: C++, Python, Perl, Linux, Shell Scripting, Java Script, JQuery, JSON, MySQL, Apache, Linux.