- Over 7+ years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance and User training of software application which includes Big Data, Hadoop and HDFS environment and Java Development.
- Experience in developing Map Reduce Programs using ApacheHadoop for analysing the bigdata as per requirement.
- Hands on using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Used different HiveSerde's like Regex Serde and HBaseSerde.
- Experience in analyzing data using Hive, Pig Latin, and custom MR programs in Java.
- Hands on using job scheduling and monitoring tools like Oozie and Zookeeper
- Clear understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
- Hands on writing custom UDFs for extending Hive and Pig core functionality.
- Hands on dealing with log files to extract data and to copy into HDFS using flume.
- Wrote Hadoop Test Cases in Hadoop for checking Input and Outputs.
- Hands on integrating Hive and HBase.
- Experience in NOSQL databases: Mongo DB, HBase, Cassandra
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
- Experience in Amazon AWS cloud which includes services like: EC2, S3, EBS, ELB, Route53, Autoscaling, CloudFront, CloudWatch, and Security Groups.
- Familiar with data warehousing and ETL tools like Informatica.
- Extensive experience in SOA-based solutions - Web Services, Web API, WCF, SOAP including Restful APIs services.
- Hands on experience on installing, configuring, and using Hadoop components like HadoopMapReduce(MR1), YARN(MR2), HDFS, Hive, Pig, Flume and Sqoop, Spark, Kafka.
- Experience in JAVA, J2EE, Web Services, SOAP, HTML and XML related technologies demonstrating strong analytical and problem-solving skills, computer proficiency and ability to follow through with projects from inception to completion.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database and JavaCore concepts like OOPS, Multithreading, Collections and IO.
- Hands on JAXWS, JSP, Servlets, Struts, Web Logic, Web Sphere, Hibernate, spring, JBoss, JDBC, RMI, Java Script, Ajax, jQuery, Linux, UNIX, XML, and HTML.
- Developed applications using Java, RDBMS, and Linux shell scripting.
- Experience in complete project life cycle of ClientServer and Web applications.
- Good understanding of Data Mining and MachineLearning techniques.
- Experience in Administering, Installation, Configuration, Troubleshooting, Security, Backup, Performance Monitoring and Fine-tuning of Linux Red Hat.
- Experience in scripting to deploy monitors, checks and critical system admin functions automation
- Have good interpersonal, communicational skills, strong problem-solving skills, explore/adopt to new technologies with ease and a good team member.
- Have the motivation to take independent responsibility and strong work ethic with desire to succeed and make significant contributions to the organization.
Big Data /Hadoop Developer
Confidential, Dallas, TX
- Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modelling, development, Implementation, testing.
- Scripts were written for distribution of query for performance test jobs in AmazonData Lake.
- Created Hive Tables, loaded transactional data from Teradata using Sqoop and Worked with highly unstructured and semi structured data of 2 Petabytes in size
- Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data .
- Created and worked Sqoop jobs with incremental load to populateHive External tables.
- Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
- Responsible for building scalable distributed data solutions using HadoopCloudera.
- Designed and developed automation test scripts using Python
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Writing Pig scripts to transform raw data from several data sources into forming baseline data .
- Analysed the SQL scripts and designed the solution to implement using Pyspark
- Implemented Hive Generic Confidential 's to in corporate business logic into Hive Queries.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
- Uploaded streaming data from Kafka to HDFS,HBase and Hive by integrating with storm.
- Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Supporting data analysis projects by using Elastic MapReduce on the AmazonWebServices (AWS) cloud performed Export and import of data into s3.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Shading features.
- Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map - Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Worked on custom Talend jobs to ingest, entich and distribute data in ClouderaHadoop ecosystem.
- Creating Hive tables and working on them using HiveQL.
- Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
- Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and Involved in End-to-End implementation of ETL logic.
- Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
- Worked on Cluster co-ordination services through Zookeeper.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Exported the analysed data to the RDBMS using Sqoop for to generate reports for the BI team.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Creating the cube in Talend to create different types of aggregation in the data and also to visualize them.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
Big Data /Hadoop Developer
Confidential, San Antonio, TX
- Responsible for installation and configuration of Hive, Pig, Hbase and Sqoop on the Hadoopcluster and created hive tables to store the processed results in a tabular format.
- Configured Spark Streaming to receive real time data from the ApacheKafka and store the stream data to HDFS using Scala.
- Developed the Sqoop scripts in order to make the interaction between Hive and vertica Database.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto - scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Streamed AWS log group into Lambda function to create service now incident.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
- Scheduled several time based Oozie workflow by developing Python scripts.
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Designing ETLData Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and MySQL.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Implementing Hadoop with the AWSEC2 system using a few instances in gathering and analyzing data log files.
- Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Created partitioned tables and loaded data using both static partition and dynamic partition method.
- Developed custom Apache Spark programs in Scala to analyze and transform unstructured data .
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Implemented POC to migrate MapReduce jobs into SparkRDD transformations using SCALA
- Scheduled map reduce jobs in production environment using Ooziescheduler.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig
- Analyzing Hadoop cluster and different BigData analytic tools including Pig, Hive, HBase and Sqoop.
- Improved the Performance by tuning of HIVE and map reduce.
- Research, evaluate and utilize new technologies/tools/frameworks around Hadoop ecosystem
Confidential, Tysons Corner, VA
- Executed Hive queries that helped in analysis of market trends by comparing the new data with EDW reference tables and historical data .
- Managed and reviewed Hadoop log files job tracker, NameNode, secondary NameNode, data node, and task tracker.
- Tested raw market data and executed performance scripts on data to reduce the runtime.
- Involved in loading the created Files into HBase for faster access of large sets of customer data without affecting the performance.
- Importing and exporting the data from HDFS to RDBMS using Sqoop and Kafka.
- Executed the Hive jobs to parse the logs and structure them in relational format to provide effective queries on the log data .
- Created Hive tables (Internal/external) for loading data and have written queries that will run internally in MapReduce and queries to process the data .
- Developed PigScripts for capturing data change and record processing between new data and already existed data in HDFS.
- Creating scalable perform ant machine learning applications using the Mahout.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Involved in importing of data from different data sources, and performed various queries using Hive,MapReduce, and PigLatin.
- Involved in loading data from local file system to HDFS using HDFSShell commands.
- Experience on UNIXshellscripts for process and loading data from various interfaces to HDFS.
- Develop different components of Hadoop ecosystem system process that involves Map Reduce, and Hive.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Big Data, Java, Flume, Kafka, Yarn, HBase, Kafka Oozie, Java, SQL scripting, Linux shell scripting, Mahout, Eclipse and Cloudera.
Confidential, Atlanta, GA
- Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, HBase, Sqoop, Spark AVRO, Zookeeper etc.), Cloudera distributed Hadoop (CDH4)
- Installed and configured HadoopMapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
- Involved in installing Hadoop Ecosystem components.
- Importing and exporting data into HDFS, Pig, HiveandHBase using SQOOP.
- Responsible to manage data coming from different sources.
- Flume and from relational database management systems using SQOOP.
- Responsible to manage data coming from different data sources.
- Involved in gathering the requirements, designing, development and testing.
- Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
- Developed simple and complex MapReduce programs in Java for DataAnalysis.
- Load data from various data sources into HDFS using Flume.
- Developed the PigUDF'S to pre - process the data for analysis.
- Worked on Hue interface for querying the data .
- Created Hive tables to store the processed results in a tabular format.
- Developed Hive Scripts for implementing dynamic Partitions.
- Developed Pigscripts for data analysis and extended its functionality by developing custom Confidential 's.
- Extensive knowledge on PIG scripts using bags and tuples.
- Experience in managing and reviewing Hadoop log files.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team.
Environment: Hadoop (CDH4), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, HBase, Oozie, SQOOP and MySQL.
- Extensively involved in the design and development of JSP screens to suit specific modules.
- Converted the application's console printing of process information to proper logging technology using log4j.
- Developed the business components (in core Java) used in the JSP screens.
- Involved in the implementation of logical and physical database design by creating suitable tables, views and triggers.
- Developed related procedures and functions used by JDBC calls in the above components.
- Extensively involved in performance tuning of Oracle queries.
- Created components to extract application messages stored in xml files.
- Executed UNIXshellscripts for command line administrative access to oracle database and for scheduling backup jobs.
- Created war files and deployed in webserver.
- Performed source and version control using VSS.
- Involved in maintenance support.