- 5 years of professional IT experience and technical proficiency in Big data space with hands on expertise in development on Hadoop Platform and Java.
- Extensive working experience on Hadoop eco - system components like MapReduce (MRv1, Yarn), Hive, Pig, Sqoop, Oozie.
- Proficient in writing Map Reduce Programs and using Apache Hadoop Java API for analyzing the structured and unstructured data.
- Good Understanding of Hadoop architecture and Hands on experience with Hadoop components such as Name Node, Data Node and Map Reduce concepts and HDFS Framework.
- Experience with working on cloud infrastructure like Amazon Web Services(AWS)
- Experience in launching EMR cluster, Redshift cluster,EC2 instances,S3 buckets, Amazon DataPipeline,SimpleWorkflowServices instances.
- Experience in ingesting streaming data into hadoop using Spark, Storm Framework and Scala.
- Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing Partitioning and Bucketing, writing and optimizing the HiveQL queries.
- Experience in writing Pig Latin scripts to sort,group,join and filter the data.
- Experience in writing UDF’S in java for hive and pig.
- Worked on UNIX shell scripts as part of the ETL process for implementing business logic and scheduled the jobs using CA7 Scheduler,Oozie Scheduler.
- Experience in writing customized input formats using Mapreduce, working on various file formats like Avro,XML,JSON files,Log data.
- Worked with different Hive file formats like RC file, Sequence file, ORC file format and Parquet.
- Experience in using Apache Sqoop to import and export data to and from HDFS and Hive.
- Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
- Good knowledge of No-SQL databases-Hbase, Cassandra and MongoDB.
- Working experience on Pentaho Report Designer and Tableau visualization.
- Experience in developing applications using Core Java and JSP,Html and CSS.
- Worked on customizing Log4j.Properties redirecting hive/hbase logs to databases.
- Good experience working with AWS, Cloudera and Pivotal HD Distribution.
- Has knowledge on Kafka,Mahout machine learning,R.
- Comprehensive knowledge of Software Development Life Cycle,Agile methodology, coupled with excellent communication skills.
- Experience working in both team and individual environments. Always eager to learn new technologies and implement them in challenging environment.
- Strong analytical and Problem solving skills.
- Team player with good Inter personnel skills,communication and presentation skills. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
Hadoop Technologies and Distributions: Apache Hadoop, HDP,Cloudera Hadoop Distribution CDH3, CDH4, CDH5, AWS, Pivotal HD(2.0)
Hadoop Ecosystem: HDFS, Map-Reduce, Hive, Pig, Sqoop, Oozie, Flume,Kafka,Zookeeper,HCatalog,Spark,StormNoSql Databases: Cassandra, MongoDB, HBase
Programming: C,Core Java 7,8, Advanced Java PL/SQL,Shell Scripting
AWS Hadoop Services: S3,EMR,SimpleWorkFlow,DataPipeline,Redshift Database
RDBMS: ORACLE, MySQL, SQL Server
Operating Systems: Linux (RedHat, CentOS), Windows XP/7/8
Web Servers: Apache Tomcat
ETL: Pentaho Report Designer
BI Tools: Tableau.
Confidential, Bellevue, WA
Senior Hadoop Developer
- Involved in injesting data into IDW staging directly from BEAM, (an inbuilt component for ingesting real time data into hadoop) using Apache Storm to push data into HDFS.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically to run multiple Hive, shell script and Pig jobs which run independently with time and data availability.
- Part of the design team of the various generic components such as SCD and Data Validation.
- Development of the solution for several data ingestion channel and patterns, also involved in production issues.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Used Shell scripting for automation of scripts.
- Worked on QA support activities, test data creation and Unit testing activities.
- Used HBase in accordance with Hive/Pig as per the requirement.
- Worked on PIG joins, and Join optimization, processing the incremental data using hadoop.
- Created oozie jobs using sqoop to export the data from Hadoop toTeradata development.
- Involved in developing a customized in built tool Data Movement Framework(DMF) for ingesting data from external and internal sources into hadoop using Sqoop,Shell script.
- Proposed an automated system using Shell script to implement import using sqoop .
- Worked in Agile development approach and managed the Hadoop teams of various Sprints
Environment: HortonworksDataPlatform Hadoop Platform, HDFS, Hbase,Hive, Java, Sqoop, Oracle,MySQL,Storm .
Confidential, Bentonville, AR
Senior Hadoop Developer
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Worked on automation of delta feeds from, Teradata using Sqoop, also from FTP Servers to Hive.
- Involved in exporting data from Hadoop to Greenplum using GPload utility.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous datasources to make it suitable for ingestion into Hive schema for analysis
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analysed the imported data using Hadoop Components
- Established custom MapReduces programs in order to analyze data and used Pig Latin to clean unwanted data
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side join’s.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Involved in loading data from LINUX file system to HDFS.
Environment: Hadoop, Pig, Hive, Sqoop, Flume, MapReduce, HDFS, LINUX, Oozie.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig,Hive, and MapReduce.
- Launching and Setup of HADOOP Cluster on AWS, which includes configuring different components of HADOOP
- Managed the Hive database, which involves ingest and index of data.
- Launching the EMR Cluster and Redshift cluster.
- Implementing the Amazon EMR (Elastic MapReduce) job to process the data in zip format and converting to Gzip format.
- Involved in customizing the Input format for zip files(ZipInputFormat).
- Cleansing and processing the Zip file data in the MapReduce.
- Creating jar file and uploading into S3 Bucket.
- Adjustments of delimiters in data using EMR.
- Creating Datapipeline jobs for automation process.
- Scheduling the Dataload process into Redshift DB.
- Monitoring the EMR jobs.
- Implementing and running the queries in redshift cluster.
- Implementing autoscaling for Redshift database.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Worked on tuning the performance Pig queries.
- Experience working on processing unstructured data using Pig and Hive.
- Worked on evaluating complex business metrics in Pig,Mapreduce.
Environment: Amazon EMR,DataPipeline,,MapReduce(Java), S3, Redshift, Java, Map-Reduce, Hive, Pig,EMR,SWF Java API
- Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting tools to leverage the performance of System.
- Collected the logs from the physical machines and integrated into HDFS using Flume.
- Developed custom MapReduce programs to extract the required data from the logs.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Imported data frequently from Teradata to HDFS using Sqoop.
- Used Tableau for visualizing and to generate reports.
- Managing and scheduling Jobs using Oozie on a Hadoop cluster.
- Experience in Hadoop stack, cluster architecture and monitoring the cluster
- Involved in defining job flows, managing and reviewing log files.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive and Pig jobs.
- Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
- Extracted files from different sources like Teradata,db2 and placed into HDFS using Sqoop and preprocess the data for analysis.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: JDK 1.5, Hadoop, HDFS, Pig, Hive, MapReduce, HBase, Sqoop, Oozie and Flume, Tableau.