- Over 8+ years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
- 4+years of work experience in ingestion, storage, querying, processing, and analysis of Bigdata with hands on experience in Big data Eco - system related technologies like Map Reduce, Hive, Spark, Cloudera Navigator, Mahout, HBase, Pig, Zookeeper, Scoop, Flume, Oozie and HDFS.
- Good at working on low-level design documents and System Specifications.
- Experience in working with BI team and transform Big Data requirements into Hadoop centric technologies.
- Extensive experience working in Teradata, Oracle, Netezza, SQL Server, DB2, and MySQL database.
- Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
- Solid Experience of creating PL/SQL Packages, Procedures, Functions, Triggers, Views and Exception handling for retrieving, manipulating, checking, and migrating complex data sets in Oracle.
- Strong hold on Informatica PowerCenter, Oracle, Vertica, hive, SQL Server, Shell scripting and QlikView.
- Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), Python and Scala.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
- Experienced in developing and implementing MapReduce programs using Hadoop to work with Big Data requirement.
- Experience with common Big Data technologies such as Cassandra, Hadoop , HBase, MongoDB, and Impala .
- Hands on Experience in Big Data ingestion tools like Flume and Sqoop.
- Experience in Cloudera distribution and Horton Works Distribution (HDP).
- Good hands on experience in developing Hadoop applications on SPARK using SCALA as a functional and object-oriented programming.
- Experience in working with different kinds of data files such as XML, JSON, Parquet, Avro, and Databases.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Experience in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Expertise in writing HIVE queries, Pig and Map Reduce scripts and loading the huge data from local file system and HDFS to Hive.
- Debugging MapReduce jobs using Counters and MRUNIT testing.
- Experience in developing NoSQL database by using CRUD, Sharding, Indexing and Replication .
- Experienced in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the requirement.
- Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity.
- Knowledge of processing and analysing real-time data streams/flows using Kafka and HBase.
- Good understanding of MPP databases such as HP Vertica, Greenplum and Impala.
- Good understanding of HDFS Designs, Daemons and HDFS high availability (HA).
- Expertise in using job scheduling and monitoring tool like Oozie.
- Good understanding of Spark Algorithms such as Classification, Clustering, and Regression.
- Good understanding on Spark Streaming with Kafka for real-time processing.
- Extensive experienced working with Spark tools like RDD transformations, spark MLlib and spark QL.
- Knowledge of data warehousing and ETL tools like Informatica, Talend.
- Experience in developing and scheduling ETL workflows in Hadoop using Oozie with the help of deployment and managing Hadoop cluster using Cloudera and Horton works .
- Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia.
- Experience in layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive) & Engineering (Jobs and Workflows) for developing ETL processes to load data from multiple data sources to HDFS using Sqoop, Pig & Oozie for automating workflow.
- Hands on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka .
- Good experience in working with cloud environment like Amazon Web Services EC2 and S3 .
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
- Expert on Microsoft Power BI and Tableau reports, dashboards, and publishing to the end users for executive level Business Decision.
- Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MSOffice, PL/SQL Developer, SQL*Plus.
- Ability to meet deadlines and handle multiple tasks, decisive with strong leadership qualities, flexible in work schedules and possess good communication skills.
- Team player, motivated and able to grasp things quickly with analytical and problem-solving skills.
- Comprehensive technical, oral, written, and communicational skills.
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Solr, Storm, Drill, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.
Hadoop Distributions: Cloudera, MapReduce, Hortonworks, IBM Big Insights
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL, and Oracle
RDBMS: Teradata, Oracle Pl/SQL, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS, and Windows
ETL Tools: Informatica Power center
Reporting tools: Tableau
Confidential, Minneapolis, MN
SR. Hadoop Developer
- Gathered User requirements and designed technical and functional specifications.
- Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper, and Sqoop .
- Loading the data from the different Data sources like ( Teradata and DB2 ) into HDFS using Sqoop and load into Hive tables , which are partitioned.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL .
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
- Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on importing and exporting data into HDFS and Hive using Sqoop.
- Used Flume to handle streaming data and loaded the data into Hadoop cluster.
- Developed and executed hive queries for de-normalizing the data.
- Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analyses.
- Responsible for executing hive queries using Hive Command Line, Web GUI HUE, and Impala to read, write and query the data into HBase.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Worked on Cluster of size 130 nodes.
- Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server .
- Developed batch processing pipeline to process data using python and airflow . Scheduled spark jobs using airflow.
- Involved in writing, testing, and running MapReduce pipelines using Apache Crunch .
- Managed, reviewed Hadoop log file, and worked in analyzing SQL scripts and designed the solution for the process using Spark .
- Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
Confidential, San Diego, CA
- Worked on a live 24 node Hadoop cluster running on HDP 2.2.
- Importing and exporting data jobs, to perform operations like copying data from RDBMS and to HDFS using Sqoop.
- Worked with Sqoop jobs with incremental load to populate HAWQ External tables to internal table.
- Created external and internal tables using HAWQ.
- Worked with Spark core, Spark Streaming, and spark SQL modules of Spark.
- Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
- Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
- Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
- Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters.
- Assisted with performance tuning, monitoring, and troubleshooting.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Experience in manipulating the streaming data to clusters through Kafka and Spark- Streaming.
- Optimized Hive QL/pig scripts by using execution engine like TEZ, Spark.
- Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Experienced in reviewing Hadoop log files to delete failures.
- Performed benchmarking of the NoSQL databases, Cassandra and HBASE streams.
- Worked with Pig, HBASE, NoSQL database HBASE and Sqoop. For analyzing the Hadoop cluster as well as big data.
- Knowledge of workflow/schedulers like Oozie/crontab/Autosys.
- Very good understanding of partitions, bucketing concepts in Hive and designed both Managed and External tabled in Hive to optimize performance.
- Creating Hive tables and working on them for data analysis to meet the business requirements.
- Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
- Experience in using Sequence files, RC file, AVRO and HAR file formats.
- Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
- Used FLUME to dump the application server logs into HDFS.
- Automating backups by shell for Linux to transfer data in S3 bucket.
- Experience in UNIX Shell scripting.
- Hands on experience using HP ALM. Created test cases and uploaded into HP ALM.
- Automated incremental loads to load data into production cluster.
Environment: Hadoop, MapReduce, AWS, HDFS, Hive, HBASE, Sqoop, Pig, Flume, Oracle, Teradata, PL/SQL, Java, Shell Scripting, HP ALM.
Confidential, Santa Monica, CA
- In depth understanding knowledge of Hadoop architecture and various components such as HDFS, application manager, node master, resource manager name node, data node and map reduce concepts.
- Involved in moving all log files generated from various sources to HDFS for further processing through flume
- Imported required tables from RDBMS to HDFS using Sqoop and used Storm and Kafka to get real time streaming of data into HBase.
- Involved in creating hive tables and loading with data writing hive queries that will run internally in a map reduce way.
- Good experience with NoSQL database HBase and creating HBase tables to load large set of semi structured data coming from various sources
- Involved in moving all log files generated from various sources to HDFS for further process through flume.
- Implemented the workflows using apache framework to automate tasks.
- Written Map Reduce code that will take input as log files and parse the and structures them in in tabular format to facilitate effective querying on the log data.
- Developed java code to generate, compare & merge AVRO schema files.
- Developed complex map reduce streaming jobs using java language that are implemented using hive and pig.
- Used hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used hive optimization techniques during joins and best practices in writing hive scripts using HiveQL.
- Importing and exporting data into HDFS and hive using Sqoop.
- Writing the HIVE queries to extract the data processed.
- Developed data pipeline using flume, Sqoop, pig and map reduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Implemented Spark using Scala and utilizing Spark core, Spark streaming and Spark SQL API for faster processing of data instead of Map reduce in Java.
- Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive tables handled structured data using Spark SQL.
- Developed Pig Latin scripts to extract the data from the web server out files to load into HDFS.
- Created HBase tables to store variable data formats of data coming from different legacy systems.
- Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Good understanding of Cassandra architecture, replication strategy, gossip, snitches etc.
- Expert knowledge on MongoDB NoSQL data modelling, tuning, disaster recovery and backup.
Environment: Hadoop, HDFS, MapReduce, Hive, Python, PIG, Java, Oozie, HBASE, Sqoop, Flume, MySQL.
Confidential, Kansas City, MO
- Involved in Analysis, Design, Development, Integration and Testing of application modules and followed AGILE/SCRUM methodology. Participated in Estimation size of Backlog Items, Daily Scrum and Translation of backlog items into engineering design and logical units of work(tasks).
- Used Spring framework for implementing IOC/JDBC/ORM, AOP and Spring Security to implement business layer.
- Developed and Consumed Web services securely using JAX-WS API and tested using SOAP UI.
- Extensively used Action, Dispatch Action, Action Forms, Struts Tag libraries, Struts Configuration from Struts.
- Extensively used the Hibernate Query Language for data retrieval from the database and process the data in the business methods.
- Developed pages using JSP, JSTL, Spring tags, jQuery, Java Script & Used jQuery to make AJAX calls.
- Used Jenkins continuous integration tool to do the deployments.
- Worked on JDBC for database connections.
- Worked on multithreaded middleware using socket programming to introduce whole set of new business rules implementing OOPS design and principles
- Involved in implementing Java multithreading concepts.
- Developed several REST web services supporting both XML and JSON to perform task such as demand response management.
- Used Servlet, Java, and spring for server-side business logic.
- Implemented the log functionality by using Log4j and internal logging API's.
- Used Junit for server-side testing.
- Used Maven build tools and SVN for version control.
- Developed frontend of application using Bootstrap, Agular.JS and Node.JS frameworks.
- Implemented SOA architecture using Enterprise Service Bus (ESB).
- Used IBM MQ Series as the JMS provider.
- Responsible for writing SQL Queries and Procedures using DB2.
- Connection with Oracle, MySQL Database is implemented using Hibernate ORM. Configured hibernate, entities using annotations from scratch.
- Used JSP pages through Servlets Controller for client-side view.
- Always used the best practices of Java/J2EE to minimize the unnecessary object creation.
- Implement Restful web services with the Struts framework.
- Verify them with the J Unit testing framework.
- Working experience in using Oracle 10g backend Database.
- Used JMS Queues to develop Internal Messaging System.
- Developed the UML Use Cases, Activity, Sequence and Class diagrams using Rational Rose.
- Developed Java, JDBC, and Java Beans using JBuilder IDE.
- Developed JSP pages and Servlets for customer maintenance.
- Apache Tomcat Server was used to deploy the application.
- Involved in Building the modules in Linux environment with ant script.
- Used Resource Manager to schedule the job in Unix server.
- Performed Unit testing, Integration testing for all the modules of the system.
- Developed JAVA BEAN components utilizing AWT and SWING classes.