- 8+ years of experience in a various IT related technologies, which includes 4 years of hands - on experience in Big Data technologies.
- Implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, Zookeeper and HBase
- Apache Spark
- Spring Boot
- Elastic search
- Java Script
- Shell Scripting
- Dynamo DB
- Apache Tomcat
Confidential, Rochester, MN
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Worked on batch processing of data sources using Apache Spark, Elastic search.
- Involved in Converting Hive/SQL queries into Sparktransformations using Spark RDD, Scala.
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance
- Experience in pushing from Impala to micro strategy.
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
- Loading data from different source into hive using Talend tool.
- Implemented Data Ingestion in real time processing using Kafka.
- Developed data pipeline using Kafka and Storm to store data in to HDFS.
- Used all major ETLtransformations to load tables through Informatica mappings.
- Worked on Sequential files, RC files, Maps ide joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an AgileMethodology perspective.
- Used Apache Maven extensively while developing MapReduce program.
- Coordinating with Business for UAT sign off.
- Worked on Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Worked on AWS environment for developing and deploying of custom Hadoop applications.
- Extracted and Stored data on DynamoDB to work on Hadoop Application.
- Generate Pipeline using PySpark and Hive
- Created HBase tables to store various data formats of PII data coming from different portfolios
- Experiencein developing java applications using SpringBoot .
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop
- Experience working on processing unstructured data using Pig and Hive
- Developed spark scripts using Python .
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Assisted in monitoring Hadoopcluster using tools like Nagios, and Ganglia
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
- Developed Docker Images, Containers, Registry.
- Installed and configured HadoopMapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Importing and exporting data into HDFS and Hive using Sqoop
- Used Cassandra CQL and Java API’s to retrieve data from Cassandra table.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Worked hands on with ETL process.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Extracted the data from Teradata into HDFS using Sqoop.
- Analyzed the data by performing Hive queries and running Pigscripts to know user behavior like shopping enthusiasts, travelers, music lovers etc.
- Exported the patterns analyzed back into Teradata using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multipleHive .
- Developed Hivequeries to process the data and generate the data cubes for visualizing.
- Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
- Implemented J2EE standards, MVC2 architecture using Struts Framework
- Developed web components using JSP, Servlets and JDBC
- For analysis and design of application created Use Cases, Class and Sequence Diagrams.
- Implemented Servlets, JSP and Ajax to design the user interface
- Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface
- Used JBoss for EJB and JTA, for caching and clustering purpose
- Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests
- Wrote Web Services using SOAP for sending and getting data from the external interface
- Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework
- Used Design patterns such as Business delegate, Service locator, Model View Controller ( MVC ), Session, DAO.
- Involved in fixing defects and unit testing with test cases using JUnit
- Developed stored procedures and triggers in PL/SQL
- Implemented server side programs by using Servlets and JSP.
- Designed, developed and validated user interface using HTML, Java Script, XML and CSS.
- Implemented MVC using Struts Framework.
- Handled the database access by implementing Controller Servlet.
- Implemented PL/SQL stored procedures and triggers.
- Used JDBC prepared statements to call from Servlets for database access.
- Designed and documented of the store procedures.
- Widely used HTML for web based design.
- Worked on database interactions layer for updating and retrieving of data from Oracle database by writing stored procedures.
- Used spring framework dependency injection and integration with Hibernate. Involved in writing JUnit test cases.