- Experience in developing, implementing, configuring, testing various systems using Hadoop technologies.
- Good understanding of Hadoop daemons like Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker and Yarn Architecture .
- Experience in using Hive QL for analyzing, querying and summarizing huge data sets.
- Experienced with Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Pig as ETL tool to do transformations, joins, filter and some pre-aggregation.
- Developed User Defined Functions ( UDFs) for Pig and Hive using Java based languages.
- Queried both Managed and External tables created by Hive using Impala.
- Experience in loading logs from multiple sources directly into HDFS using Flume.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map- Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Experience in scheduling Jobs thru Oozie and knowledge on Autosys, TAC and Zena.
- Experience in processing of real-time data using Spark and Scala.
- Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data.
- Hands-on experience with message brokers such as Apache Kafka and RabbitMQ.
- Experience working with MapR volumes and snapshots for data redundancy.
- Experience in fetching data into Hadoop Datalake from various databases like MySQL, Oracle, DB2, Teradata and SQL Server using Sqoop.
- Hands on Evaluation of ETL ( Talend) and OLAP tools and recommend the most suitable solutions based on business requirements.
- Experience in generating reports using Tableau by connecting to Hive.
- Experience in using Kerberos for authenticating end users using Hadoop in a secure mode.
- Experience in working with file formats like Parquet, Avro, RC, ORC, Sequence Files and JSON etc.
- Excellent knowledge on UNIX and Shell scripting.
- Expertise in design and development of Web Applications involving J2EE technologies with Java, Spring, EJB , Hibernate , Servlets , JSP , Struts , Web Services, XML, JMS, JDBC etc.
- Extensive experience in using Relational databases like Oracle, SQL Server, DB2, Teradata and MySQL.
- Experience in working with different Hadoop distributions like CDH, MapR and HortonWorks.
- Expertise in using Tomcat server and also application servers like JBoss and Web Logic.
- Good knowledge in Finance and Health Care Domains.
Hadoop / Big Data Stack: HDFS, YARN, MapReduce, Pig, Hive, Spark, SparkSQL, Scala, Kafka, ZooKeeper, HBase, Spark, Sqoop, Flume, Shell script, Oozie.
Hadoop Distributions: MapR, Horton Works, cloudera
Databases: Oracle, MySQL, DB2, Teradata, SQL Server, Sybase.
No SQL Databases: HBase, Cassandra.
Query Languages: HiveQL, SQL, Pig.
Frameworks: MVC, Struts, Spring, And Hibernate.
Build& Integration Tools: Maven, Ant, Jenkins.
Operating Systems: Windows, Linux, Unix and CentOS.
Sr. Big data Engineer
- Developed Sqoop Framework to ingest Historical data and incremental data from Oracle, DB2 and SQL Server etc.
- Worked on flume , to read the messages from JMS Queue to load in HDFS.
- Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
- Transformed raw data by developing Pig scripts and loaded the data into HBase tables.
- Developed custom UDF’s to generate unique key for the use in pig transformations.
- Designed HBase schema to avoid hot spotting and exposed the data from HBase tables to REST API on UI.
- Identified control characters in the data and developed scripts to remove them.
- Converted existing Pig Scripts to Spark , as part of improving performance.
- Helped market analysts by creating Hive queries to spot the emerging trends by comparing fresh data with HDFS reference tables and historical metrics.
- Developed spark code using Scala for faster data processing using RDD's and Dataframe API.
- Executed Spark SQL queries against data in Hive in spark context and done performance optimization.
- Worked on Creating Kafka topics, partitions, writing custom practitioner classes.
- Defined the job flows in Oozie to automate the process of data loading into the HDFS and Pig.
- Involved in creating POCs to ingest and process streaming data using Spark streaming and Kafka.
- Performed various performance optimizations like using distributed cache for small datasets, Partitions, Bucketing in hive and Map side joins in MapReduce.
- Created Branches in GitHub, pushed the code and deployed to production thru Jenkins for the production release.
- Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production.
Environment: Hadoop, HDFS, cloudera, Hortonworks, Pig, Hive, Spark, SparkSQL, Scala, HBase, Oozie, Sqoop, Flume, Kafka, Linux, Java, Maven, Junit, GitHub, Jenkins.
Sr. Big data Engineer
- Involved in design, development, integration, deployment/production support & other technical aspects of the development of, modification to the applications
- Created Informatica source-to-target mapping using different transformations to implement business rules to fulfill the data integration requirements.
- Prepared DML to perform audit & error handling.
- Provided project estimates, coordinated the development efforts and discussions with the business partners, updated status reports and handled logistics to enable smooth functioning of the project and meet all the deadlines.
- Worked on SQL developer querying the source/target tables to validate the SQL and Lookup override.
- Performing unit testing and responsible for System Integration testing, Regression testing until it moves to production
- Installed and configured Hadoop MapReduce, HDFS
- Importing and exporting data into HDFS and Hive using Sqoop .
- Experienced in defining job flows. Experienced in managing and reviewing Hadoop log files.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Supported Map Reduce Programs those are running on the cluster.
- Installed and configured Hive and written Hive UDFs .
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
- Working knowledge in modifying and executing the UNIX shell scripts. Involved in web testing using soap UI for different member and provider portals.
- Involved in building and maintaining test plans, test suites, test cases, defects and test scripts
- Conducted functional, system, data and regression testing.
- Involved in Bug Review meetings and participated in weekly meetings with management team.
Environment: Hadoop, HDFS, Coludera, Pig, Hive, Spark, SparkSQL, Scala, HBase, Oozie, Sqoop, Flume, Kafka, Linux, Java, Maven, Junit, GitHub, Jenkins.
- Involved in Requirement analysis and design, development of the application using Java Technologies.
- Developed the login screen so that the application can be accessed only by authorized and authenticated administrators.
- Used HTML, CSS, JSP's to design and develop front end and used Java Script to perform user validation.
- Performed Designing, developing, and configuring server side J2EE components like EJB, Java Beans, and Servlets.
- Involved in Creating tables, functions, triggers, sequences and stored procedures in PL/SQL.
- Implemented business logic by developing Session Beans.
- Involved in developing JSP pages using Struts custom tags, JQuery and Tiles Framework.
- Used Hibernate as the ORM and PL/SQL for handling database processing.
- Used JDBC-API to communicate with the Database.
- Developed application using Waterfall model software methodology.
- Involved In technical documentation of project.
Environment: Java, HTML, CSS, JSP, Servlets, EJB, JQuery, JDBC, Hibernate, PL/SQL.