- 8 years of overall experience in IT Industry which includes experience in Big Data Technologies and Web Applications in multi - tiered environment using Hadoop, Spark, Hive, Pig, Sqoop, J2EE (Spring, JSP, Servlets), JDBC, HTML, CSS and Java Script (Angular JS).
- 4+ years of comprehensive experience in Big Data Analytics and its ecosystem components in Cent OS and RHEL Linux environment.
- Working knowledge in AWS environment and AWS spark with Strong experience in Cloud computing platforms such as AWS services.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Hands On experience on Spark Core, Spark SQL, Spark Streaming and creating the Data Frames handle in
- SPARK with Scala.
- Experience in NoSQL databases and worked on table row key design and to load and retrieve data for real time data processing and performance improvements based on data access patterns.
- Used Spark streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time.
- Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce concepts.
- Experience in building large scale highly available Web Applications. Working knowledge of web services and other integration patterns.
- Developed Simple to complex Map/reduce and Streaming jobs using Java and Scala language.
- Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
- EMR with Hive to handle less important bulk ETL jobs.
- Hands Experience with NOSQL Databases like MongoDB.
- Having knowledge about Hadoop architecture and its different components such as HDFS, Job tracker, Task tracker, Resource Manager, Name Node, Data Node and Map Reduce concepts.
- Experience with Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Experience in analyzing the data using HQL, Pig Latin, and custom Map Reduce programs in Java
- Experienced in data formats like JSON, PARQUET, AVRO, RC and ORC formats
- Utilized Flume to analyze log files and write into HDFS.
- Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating according to client's requirement
- Having experience in working with data ingestion, storage, processing and analyzing the big data.
- Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
- Used GitHub version control tool to push and pull functions to get the updated code from repository.
Programming Languages: Java, Scala,Unix Shell Scripting, PL/SQL
J2EE Technologies: Spring, Servlets, JSP, JDBC, Hibernate.
Big Data Ecosystem: HDFS,Map Reduce, Hive, Pig, Spark, Kafka, AWS-EMR,Kinesis, Sqoop, Impala,Oozie, Zookeeper, Flume.
DBMS: Oracle 11g, SQL Server, MySQL, IBM DB2
Modeling Tools: UML on Rational Rose 4.0
IDEs: Eclipse, Net beans, WinSCP, Visual Studio and Intellij
Operating systems: UNIX, Windows, RHEL, Solaris, Centos.
Version and Source Control: TFS, Git, SVN and IBM Rational Clear Case.
Servers: Apache Tomcat, Web logic and Web Sphere
Frameworks: MVC, Spring, Struts, Log4J, Junit, Maven, ANT
Confidential, Brentwood, TN
Roles & Responsibilities:
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Used Spark, Hive for implementing the transformations need to join the daily ingested data to historic data.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over EMR Cluster Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Involved in creating Hive tables and loading and analyzing data using hive queries.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Involved in file movements between HDFS and AWS S3.
- Extensively worked with S3 bucket in AWS.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: Hadoop YARN, Spark 1.6, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop 1.4.6, Impala,Tableau, Talend, Oozie, Control-M, Java, AWSS3, Oracle 12c, Linux.
Confidential, Dallas, Texas
Roles & Responsibilities:
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers.
- Developed data pipeline using Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
- Set up Apache NiFi to transfer structured and streaming data into HDFS
- Ingested streaming data with Apache NiFi into Kafka
- Collected the JSON data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark scripts to import large files from Amazon S3 buckets.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Used Jira for bug tracking and Bitbucket to check-in and checkout code changes.
Environment: Scala, HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, NIFI, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting.
Confidential, Irvine, CA
- Wrote Hive queries for data analysis to meet the business requirements
- Load and transform large sets of structured, semi structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Migrated data between RDBMS and HDFS/Hive with Sqoop.
- Hands on using Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Used Sqoop to import and export data among HDFS, MySQL database and Hive
- Responsible for generating actionable insights from complex data to drive real business results for various application teams.
- Developed spark scripts by using Scala shell as per requirements.
- Developed and implemented API services using Scala in spark.
- Extensively implemented POC’s on migrating to Spark-Streaming to process the live data.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data as per the business requirement.
- Re-writing existing map-reduce jobs to use new features and improvements for achieving faster results.
- Adding, Decommissioning and rebalancing nodes.
- Created POC to store Server Log data in to Elastic Search to generate system Alert Metrics.
- Continuous coordination with QA team, production support team and deployment team .
Environment: :Hive, SQL, Pig, Flume, Kafka, Sqoop, Scala, Java, Shell Scripting, Unix Scripting, Spark, Teradata, Pentaho Oozie, Java, Talend.
Roles & Responsibilities:
- Implemented the application using Agile methodology. Involved in daily scrum and sprint planning meetings.
- Actively involved in analysis, detail design, development, bug fixing and enhancement.
- Driving the technical design of the application by collecting requirements from the Functional Unit in the design phase of SDLC.
- Developed Micro services using RESTful services to provide all the CRUD capabilities.
- Creating requirement documents and design the requirement using UML diagrams, Class diagrams, Use Case diagrams for new enhancements.
- Used JBoss application server deployment of applications.
- Developed communication among SOA services.
- Involved in creation of both service and client code for JAX-WS and used SOAPUI to generate proxy code from the WSDL to consume the remote service.
- Designed Node.js application components through Express.
- Implemented AJAX functionality to speed up web application.
- Created Single Page Application with loading multiple views using route services and adding more user experience to make it more dynamic by using Angular JS framework.
- Implemented with Angular JS using its advantage including two-way data binding and templates.
- Designed user interface with Java SWING of Java, keeping the business standards in mind.
- Developed Static and Dynamic pages using JSP and Servlets.
- Used Hibernate persistence strategy to interact with database.
- Worked with Session Factory, ORM mapping, Transactions and HQL in Hibernate framework.
- Used Web services for sending and getting data from different applications using Restful.
- Wrote client side and server-side validations using Java Scripts Validations.
- Writing stored procedures, complex SQL queries for backend operations with the database.
- Devised logging mechanism using Log4j.
- GitHub has been used as a Version Controlling System.
- Creating tracking sheet for tasks and timely report generation for tasks progress.
Environment: Java, J2EE, Java Swing, HTML, Java Script, Angular JS, Node.JS, JDBC, JSP, Servlet, UML, Hibernate, XML, JBoss, SDLC methodologies, Log4j, GitHub, Restful, JAX-RS, JAX-WS, Eclipse IDE.
- Experience in coding Servlets on the server side, which gets the requests from the client and processes the same by interacting the Oracle database.
- Coded Java Servlets to control and maintain the session state and handle user requests
- Used JDBC to connect to the backend database and developed stored procedures.
- Developed code to handle web requests involving Request Handlers, Business Objects, and Data Access Objects.
- Creation of JSP pages including the use of JSP custom tags and other methods of Java Beam presentation and all HTML and graphically oriented aspects of the site' s user interface.
- Used XML for mapping the pages and classes and to transfer data universally among different data sources.
- Worked in unit testing and documentation.
- Hands on experience in J2EE framework Struts. Implemented Spring Model View Controller (MVC) Architecture based presentation using JSF framework. Extensively used Core Java API, Spring API in developing the business logic.
- Designed and developed Agile Applications, Light weight solutions, and integrated applications by using and integrating different frameworks like Struts and Spring.