- Having 8 years of experience in IT industry implementing, developing and maintenance of various Web Based applications using Java, J2EE Technologies and Big Data Ecosystem.
- Strong knowledge of Hadoop Architecture and Daemons such as HDFS , JOB Tracker , Task Tracker , Name None , Data Node and Map Reduce concepts.
- Well versed in implementing E2E solutions on big data using Hadoop frame work.
- Hands on experience in writing Map Reduce programs using Java to handle different data sets using Map and Reduce tasks .
- Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
- Worked with join patterns and implemented Map side joins and Reduce side joins using Map Reduce.
- Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
- Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design.
- Having experience in developing a data pipeline using Kafka to store data into HDFS.
- Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3),EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
- Implemented Ad - hoc query using Hive to perform analytics on structured data.
- Expertise in writing Hive UDF, Generic UDF's to incorporate complex business logic into Hive Queries .
- Experienced in optimizing Hive queries by tuning configuration parameters.
- Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote Pig Scripts to load data into Hadoop environment.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
- Extensively used Apache Flume to collect the logs and error messages across the cluster.
- Experienced in performing real time analytics on HDFS using HBase.
- Used Cassandra CQL with Java API’s to retrieve data from Cassandra tables.
- Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
- Worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics.
- Experience in working with Amazon EMR, Cloudera (CDH3 & CDH4 ) and Horton Works Hadoop Distributions
- Experience in meeting expectations with Hadoop clusters using Cloudera(CDH3 &CDH4) and Horton Works.
- Worked with Oozie and Zoo-keeper to manage the flow of jobs and coordination in the cluster.
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Good experience in writing Spark applications using Python and Scala.
- Experience processing Avro data files using Avro tools and MapReduce programs.
- Implemented pre-defined operators in spark such as map, flat Map, filter, reduceByKey, groupByKey, aggregateByKey and combineByKey etc.
- Used Scala sbt to develop Scala coded spark projects and executed using spark-submit
- Added security to the cluster by integrating Kerberos.
- Worked on multiple PoC’s on Apache NiFi
- Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
- Experienced writing Test cases and implement unit test cases using testing frame works like Junit, Easy mock and mockito.
- Worked on Talend Open Studio and Talend Integration Suite.
- Adequate knowledge and working experience with Agile and waterfall methodologies.
- Good understanding of all aspects of Testing such as Unit, Regression, Agile, White-box, Black-box.
- Expert in developing applications using Servlets, JPA, JMS, Hibernate, spring frameworks.
- Extensive experience in implementing/ consume Rest Based Web Services.
- Good knowledge of Web/Application Servers like Apache Tomcat, IBM WebSphere and Oracle WebLogic.
- Ability to work with onsite and offshore team members.
- Able to work on own initiative, highly proactive, self-motivated commitment towards work and resourceful.
- Strong debugging and critical thinking ability with good understanding of frameworks advancement in methodologies and strategies.
Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume, Yarn, Spark, NiFi
Database Languages: SQL,PL/SQL,Oracle
Programming Languages: Java, Scala
Frameworks: Spring, Hibernate, JMS
Web Services: RESTful web services
Databases: RDBMS, HBase, Cassandra
IDE: Eclipse, IntelliJ
Platforms: Windows, Linux, Unix
Application Servers: Apache Tomcat, Web Sphere, Web logic, JBoss
Methodologies: Agile, Waterfall
ETL Tools: Talend
Confidential, St.Louis, MO
Spark/ Scala DeveloperResponsibilities:
- Analyze and define researcher’s strategy and determine system architecture and requirement to achieve goals.
- Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
- Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
- Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Used various spark Transformations and Actions for cleansing the input data.
- Developed shell scripts to generate the hive create statements from the data and load the data into the table.
- Wrote Map Reduce jobs using Java API and Pig Latin
- Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.
- Involved in writing custom Map-Reduce programs using java API for data processing.
- Integrated Maven build and designed workflows to automate the build and deploy process.
- Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
- The hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
- Load and transform large sets of structured, semi structured data using hive.
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Developed Hive queries for the analysts.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
- Cassandra implementation using Datastax Java API.
- Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
- Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
- Used WEB HDFS REST API to make the HTTP GET, PUT, POST and DELETE requests from the webserver to perform analytics on the data lake.
- Worked on multiple PoC’s on Apache NiFi like executing Spark script, Sqoop scripts through NiFi, worked on creating scatter and gather pattern in NiFi, Ingesting data from Postgres to HDFS, Fetching Hive metadata and storing in HDFS, creted a custom NiFi processor for filtering text from Flow files etc.
- Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
- Cluster coordination services through Zookeeper
Environment: HDP 2.3.4,Hadoop, Hive, HDFS, HPC, WEBHDFS, WEBHCAT, Spark, Spark-SQL, KAFKA, AWS,Java, Scala, Web Server’s, Maven Build and SBT build, Rally(CA Technologies)
Spark/ Scala DeveloperResponsibilities:
- Designed and developed data loading strategies, transformation for business to analyze the datasets.
- Processed flat files in various file formats and stored them as in various partition models in HDFS.
- Responsible for Building, develop, testing shared components that will be used across modules.
- Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets using Spark.
- Involved in developing a linear regression model for predicting continuous measurement.
- Responsible in Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
- Experience in extracting appropriate features from data sets in order to handle bad, null, partial records using Spark SQL.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to buil d the data model and persists the data in HDFS
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
- Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Using spark - Cassandra connector to load data to and from Cassandra.
- Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
- Responsible in development of Spark Cassandra connector to load data from flat file to Cassandra for analysis.
- Imported the data from different sources like AWS S3, LFS into Spark RDD.
- Responsible in creating consumer API’s using Kafka.
- Responsible in creating Hive tables, loading with data and writing Hive queries.
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka for persisting data into Cassandra
- Worked on a POC to perform sentiment analysis of twitter data using Open NLP API.
- Responsible in creating mappings and workflows to extract and load data from relational databases, flat file sources and legacy systems using Talend.
- Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2
- Experienced in managing and reviewing log files using Web UI and Cloudera Manager.
- Involved in creating External Hive tables and involved in data loading and writing Hive UDFs.
- Experience in using various compression techniques like Snappy Codec, Lzo, and Gzip to save data and optimize data transfer over network using Avro, Parquet.
- Involved in unit testing and user documentation and used Log4j for creating the logs.
Environment: Apache Spark, Hadoop, HDFS, Hive, Kafka, Sqoop, Scala, Talend, Cassandra, Oozie, Cloudera, Impala, linux, Oozie
Confidential, Norwalk, CT
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate particular visualizations using Tableau.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Involved in transforming data from Mainframe tables to HDFS, and HBase tables using Sqoop
- Defined the Accumulo tables and loaded data into tables for near real-time data reports.
- Created the Hive external tables using Accumulo connector.
- Written Hive UDFs to sort Structure fields and return complex data type.
- Used distinctive data formats (Text format and ORC format) while stacking the data into HDFS.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
- Creating files and tuned the SQL queries in Hive utilizing HUE.
- Experience working with Apache SOLR for indexing and querying.
- Created custom SOLR Query segments to optimize ideal search matching.
- Worked with NoSQL databases like Hbase in making Hbase tables to load expansive arrangements of semi structured data.
- Acted for bringing in data under HBase using HBase shell also HBase client API.
- Designed the ETL process and created the high level design document including the logical data flows, source data extraction process, the database staging,job scheduling and Error Handling
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
Environment: Hadoop, Cloudera, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Hbase, Accumulo, Oozie,Tableau, Java, Talend, HUE, Flume, Solr, Git, Maven.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Developed Flume Agents for loading and filtering the streaming data into HDFS.
- Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
- Moving Bulk amount data into HBase using Map Reduce Integration.
- Developed Map-Reduce programs to clean and aggregate the data
- Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
- Implement counters on HBase data to count total records on different tables.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Implemented Secondary sorting to sort reducer output globally in map reduce.
- Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
- Created Hive Dynamic partitions to load time series data
- Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
- Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
- Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
- Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
- Integrated spring schedulers with Oozie client as beans to handle cron jobs.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, RDBMS/DB, Flat files, Mysql, CSV, Avro data files.
Java/ J2EE DeveloperResponsibilities:
- Responsible for enhancement for mutual funds products written in Java, Servlets, Xml and XSLT.
- Implemented different J2EE Design Patterns such as Session Facade, Observer, Observable and Singleton, Business Delegate to accommodate feature enhancements and change requests.
- Implemented Spring (MVC) design paradigm for website design.
- Extensively written COREJAVA & Multi-Threading code in application
- Optimized SAX and DOM parsers for XML production data.
- Implemented the JMS Topic to receive the input in the form of XML and parsed them through a common XSD.
- Written JDBC statements, prepared statements, and callable statements in Java, JSPs and Servlets.
- Followed Scrum approach for the development process
- Worked on Spring Integration for communicating with business components and also worked on spring with Hibernate Integration for ORM mappings.
- Modified and added database functions, procedures and triggers pertaining to business logic of the application.
- Used TOAD to check and verify all the database turnaround times and also tested the connections for response times and query round trip behavior.
- Used ANT Builder to build the code for production line.
- Used Eclipse IDE for all recoding in Java, Servlets and JSPs.
- Used IBM Clear Case for Versioning and Maintenance.
- Involved in discussions with the business analysts for bug validation and fixing.
- Modified technical design document, functional design document to accommodate change requests.
- Wrote JUnit test cases for system testing, Used Log4j for logging.
- Used JIRA as a bug-reporting tool for updating the bug report.
- Involved in performance tuning where there was a latency or delay in execution of code
- Extensive Involvement in analyzing the requirements and detailed system study.
- Worked on Session Tracking in JSP, Servlets.
- Involved in the analysis, design, and development phase of Software Development Lifecycle.
- All the business user interface was developed using JSP with business logic implemented in Servlets
- Responsible for coding and deploying according to the Client requirements.
- JSP Content is configured in XML files
- Implemented session beans using EJB 2.0
- Responsible for performing Code Reviewing and Debugging.
- Worked on tool development, performance testing & defects fixing.
- Involved in design of the application database schema
- Written complex SQL queries, stored procedures, functions and triggers in PL/SQL
- Handled the Exceptions by using Try, Catch and Finally blocks.
- Worked on SVN version controlling.
- Developed & Deployed the Application in the IBM WebSphere Application Server