- Having 6+ years’ strong experience in Data Analytics & Data Engineering using Hadoop ecosystem.
- Currently working as Bigdata Hadoop Developer in Confidential working closely with business and development team.
- 4+ years of experience in development of Big Data projects using HDFS, SPARK, java, Scala, Python, Hive, PIG, Impala, SQOOP, KAFKA, OOZIE, YARN, MapReduce, HBase, FLUME, Cassandra, ELK stack and AWS stack.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Experience in developing applications using Big data, Java & J2EE technologies.
- Extensive Knowledge in Java, J2ee, Servlets, JSP, JDBC, and EJB/MDB, JMS, Struts and spring Framework and web services development in Telecom domain.
- Worked on Web logic, wildfly, Tomcat Web Server for Development and Deployment of the Java/J2EE Applications.
- Valuable experience in Spring & Hibernate and Expertise in developing Java Beans.
- Working knowledge of Web logic server clustering.
- Expertise in unit testing using JUnit.
- Experience in error logging and debugging using Log4J.
- Strong knowledge in creating/reviewing of data models that are created in RDBMS like Oracle 10g, MySQL databases.
- Worked with operating systems like Linux, UNIX, Solaris, and Windows 98/NT/2000/XP/Vista/7.
- Experience in working with versioning tools like GIT, TFS, CVS & Clear Case.
- Goal oriented, organized, team player with good interpersonal skills; thrives well within group environment as well as individually.
- Strong business and application analysis skills with excellent communication and professional abilities. working with agile tools Kanban Flow, Jira, Rally.
Languages: Java, Java script, Python, SQL, XML, HTML, Scala
J2EE Technologies: Servlets/JSP, Java Beans, JDBC, JMS, EJB, web services, GWT
Databases: Oracle 10g, DB2, TOAD, Mango DB.
Big data Technologies: Hadoop, Hive, Impala, MR, Solr, Spark, Kafka, Sqoop, Elastic search (ELK)
Cloud services: Amazon EMR, S3, AWS Glue, Athena, presto
No SQL: Cassandra, Hbase
EAI Technologies: Oracle SOA, BPEL, Tibco BW, Tibco EMS, Apache Camel
COTS: Oracle OSM 7.2.2
Application Servers: Tomcat 6, Weblogic 12.x, Jboss6.x, wildfly
Frame works: Struts1.2, Spring, Hibernate, Axis2, Jax - WS, Play, Akka
Operating Systems: Linux, UNIX, Windows 98/NT/2000/XP/Vista
Java IDE: Eclipse, EditPlus, and JDeveloper
Configuration tools: Git, VSS, Clear Case, StarTeam, SVN
Design Tools: Microsoft Visio
Testing Tools: SOAPUI
Sr Hadoop/Java/J2EE- Developer
Confidential, Mclean, VA
- Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
- Performed Spark join optimizations, troubleshooted, monitored and wrote efficient codes using Scala.
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
- Experienced to implement Cloudera distribution system.
- Creating Hive tables and working on them for data analysis to cope up with the requirements.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
- Experienced in working with Elastic MapReduce(EMR).
- Developed Map Reduce programs for some refined queries on big data.
- In-depth understanding of classic MapReduce and YARN architecture.
- Worked with business team in creating Hive queried for ad hoc access.
- Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Implemented Hive UDF's to implement business logic.
- Analyzed the data by performing Hive queries, SQL and Spark Streaming.
- Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System to pre-process the data.
- Created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances.
- Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
- Created HBase tables to store variable data formats coming from different portfolios Performed real-time analytics on HBase using Java API and Rest API.
- Performed test run of the module components to understand the productivity.
- Written Java program to retrieve data from HDFS and providing REST services.
- Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
- Experience in using version control tools like GITHUB to share the code snippet among the team members.
- Worked on Maven 3.3.9 for building and managing Java based projects. Hands-on experience with using Linux and HDFS shell commands. Worked on Kafka for message queuing solutions.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
- Written HBASE Client program in Java and web services.
Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, AWS, GitHub, Talend Big Data Integration, Solr, Impala.
Sr Hadoop/Java/J2EE- Developer
Confidential, Atlanta, GA
- Working on moving on premises Hadoop environment to Amazon EMR and s3 as optional storage.
- Implemented Web service calls for Different data integrations.
- Implemented POC for publishing analytics data in Cassandra column family.
- Implemented POC publish data in web based dashboard (D3.js) by calling REST Web service calls for Different data integrations.
- Implemented of aggregation solution using Spark, Cassandra, and tableau.
- Preparing Design Documents (Request-Response Mapping Documents, Hive Mapping Documents).
- Implemented of POC ETL solution using Spark, Cassandra, Alteryx and tableau.
- Worked with Dev-ops team for setting up quick AWS Hadoop environment.
- Implemented of POC aggregation solution using Spark, HBase, and tableau. worked with ingestion teams for defining ingestion process.
- Developed application using JAVA, J2EE, JSP, spring
- Involved in writing Map/Reduce jobs using java.
- POCs on R, Python, SprakML to create Data analytics reports.
- Designed and implemented a stream filtering system on top of Apache Kafka to reduce stream size.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Involved in requirement and design activates.
- Involved writing DAO call for Cassandra.
- Designed and developed Web Services REST.
- Reading messages from Kafka queue using spark streaming.
- Involved in system, manual testing while integrating with different data integration projects.
- Involved in build and deployment activities process definitions.
- Written hive quires and shell scripts for data integration. Worked as a member of the Big Data team for deliverables like design, construction, unit testing and deployment.
- Involved in writing shell scripts for executing hive queries, loading data files into Hive tables.
- Involved in gathering requirement and design.
- Initial setup to receive data from external source.
- Analysis and design on production views.
- Involved in writing various user defined functions as per the requirements.
- Translation of functional and technical requirements into detailed architecture and design
- Responsible to manage data coming from various sources.
- Implemented POC Real time data processing using Kafka - Spark integration (with Scala), publishing in Elastic search.
- Experienced in analyzing data with Hive, Spark using Scala.
- Did POC using play and Akka frameworks.
Environment: Java, J2EE, JSP, Spring, REST, Hadoop, Hive, Linux, DataStax Cassandra, Linux, Tomcat6, log4j, Eclipse, Spark, Scala, SVN, DB2, JAXB, Kafka, parquet, EMR, S3, Athena, Glue, Quicksight, ELK, Control, M.
Confidential, Cherry Hill, NJ
- Implemented Web service calls for Different integrations.
- Responsible for building scalable distributed data solutions using Hadoop.
- This project will download the data that was generated by sensors from the cars activities, the data will be collected in to the HDFS system online aggregators by Kafka.
- Experience in creating Kafka producer and Kafka consumer for Spark streaming which gets the data from different learning systems of the patients.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Experience in AWS to spin up the EMR cluster to process the huge data which is stored in S3 and push it to HDFS. Implemented automation and related integration technologies with Puppet.
- Implemented Spark SQL to access hive tables into spark for faster processing of data.
- Involved in Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
- Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
- Worked on the proof-of-concept for Apache Hadoop 1.20.2 framework initiation.
- Used Apache Oozie for scheduling and managing the Hadoop Jobs. Extensive experience with Amazon Web Services (AWS).
- Developed Python/Django application for Google Analytics aggregation and reporting.
- Developed and updated social media analytics dashboards on regular basis.
- Good understanding of NoSQL databases such as HBase, Cassandra and MongoDB.
- Supported Map Reduce Programs running on the cluster and wrote custom MapReduce Scripts for Data Processing in Java.
- Worked with Apache Nifi for Data Ingestion. Triggered the shell Script and Schedule them using Nifi.
- Monitoring all the Nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
- Created Nifi flows to trigger spark jobs in case if we have any failures we got email notifications regarding the failures.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files by using some piggy bank.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS. Used Flume to stream through the log data from various sources.
- Using Avro file format compressed with Snappy in intermediate tables for faster processing of data. Used parquet file format for published tables and created views on the tables.
- Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, test and prod environment.
- Used Amazon Kinesis Data Streams to build custom applications that analyze data streams using popular stream processing frameworks.
- Used Amazon Kinesis Data Analytics to analyze data streams with SQL
- Good understanding of ETL tools and how they can be applied in a Big Data environment.
Environment: Hadoop, MapReduce, Cloudera, Spark,, Kafka, HDFS, Hive, Pig, Oozie, Scala, Eclipse, Flume, Kinesis, Oracle, UNIX Shell Scripting
- Used Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
- Installation and Configuration of Hadoop Cluster. Working with Cloudera Support Team to Fine Tune Cluster. Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform.
- Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly. The plugin also provided data locality for Hadoop across host nodes and virtual machines.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka (version |0.8.2.2), Pig (0.12.0), Hive (version 0.10.0.) and Map Reduce (MR1 and MR2).
- Collecting and aggregating large amounts of log data using Apache Flume (version 1.5.0) and staging data in HDFS for further analysis.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Implemented various MapReduce Jobs in custom environments and updating them to HBase tables by generating hive queries.
- Real time streaming the data using Spark (version 1.4.0) with Kafka (version 0.8.2.2).
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
- Worked on tuning the performance Pig queries and involved in loading data from LINUX file system to HDFS. Importing and exporting data into HDFS using Sqoop (version 1.4.3) and Kafka.
- Supported Map Reduce Programs those are running on the cluster. Gained experience in managing and reviewing Hadoop log files. Involved in scheduling Oozie (version 4.0.0) workflow engine to run multiple pig jobs.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows. Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Computed various metrics using Java Map Reduce to calculate metrics that define user experience, revenue etc.
- Used NoSQL database with HBase and Mongo db. Exported the result set from Hive to MySQL using Shell scripts.
- Implemented SQL, PL/SQL Stored Procedures. Actively involved in code review and bug fixing for improving the performance.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Spark, Kafka, LINUX, Cloudera, Java APIs, Java, collection, SQL, NoSQL, HBase, MongoDB
- Understanding and analyzing the requirements. Designed, developed and validated User Interface using HTML, Java Script, XML.
- Used Java Mail notification mechanism to send confirmation email to customers about payments. Handled the database access by implementing Controller Servlet.
- Implemented PL/SQL stored procedures and triggers. Used JDBC prepared statements to call from Servlets for database access.
- Designed and documented of the stored procedures. Widely used HTML for web based design. Involved in Unit testing for various components.
- Eclipse IDE for front end development environment. Worked on database interaction layer for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
- Deployed the application on Tomcat Application Server. Created UNIX shell and Perl utilities for testing, data parsing and manipulation.
- Experience in implementing Web Services and XML/HTTP technologies. Involved in writing JUnit Test Cases.
- Used Log4J for any errors in the application. Planned and defined system requirements to Use Cases and scenarios using the UML methodologies.
- Actively participated in Bug analysis and fixing at Integration testing.