- About 8 years of IT/Business Solutions experience in analysis, design, development and implementation of cost - effective, high quality, high-performance and innovative technology solutions in Healthcare, Insurance domains and IT sectors
- 4+ years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, KAFKA, OOZIE, SQOOP, AVRO and ZOOKEEPER
- Well versed in configuring and administering teh Hadoop Cluster using Cloudera and Hortonworks
- Experience in creating real time data streaming solutions using Apache Spark/ Spark Streaming/ Apache Storm,Kafkaand Flume
- Currently working on Spark applications extensively using Scala as teh main programming platform
- Processing dis data using SparkStreamingAPI with Scala.
- Used Spark Data Frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building
- Used Scala collection framework to store and process metadata and other related information.
- Experience in using DataStax Spark-Cassandra connectors to get data from Cassandra tables and process them using Apache Spark
- Exposure to Data Lake Implementation and developed Data pipelines and applied business logic utilizing Apache Spark
- Involved converting Cassandra/Hive/SQL queries into Spark transformations using RDD's and Scala.
- Implemented SparkScripts using Scala, SparkSQL to access hive tables into spark for faster processing of data.
- Hands on experience doing real time on NO-SQL databases like MongoDB, HBase and Cassandra
- Experience in creating MongoDB clusters and hands on experience with complex MongoDB aggregate functions and mapping
- Designing data models in Cassandra and working with CassandraQueryLanguage
- Worked with No-SQL data-stores, primarily HBase using teh Java API of HBase and Hive Integration
- Experience in using Flume to load log files into HDFS and Oozie for data scrubbing and process
- Experienced in implementing handling skewed data, sparse data using pig Latin scripts and joins
- Experience on performance tuning of HIVE queries and JavaMapReduce programs for scalability and faster execution
- Experienced in handling real time analytics using HBase on top of HDFS data
- Experience in transforming, Grouping, Aggregations, Joins using KafkaStreamsAPI
- Hands on experience deploying KAFKA connect in standalone and distributed mode creating docker containers using DOCKER
- Created TOPICS and written KAFKAproducer and consumer in java as required, developed KAFKAsource/sink connectors to store teh streaming new data into topics,from topics to required different database by performing ETL tasks also used Akkatoolkit with Scala to perform some builds
- Experienced in collecting metrics for Hadoop clusters using Ambari&ClouderaManager.
- Has noledge on Storm architecture, Experience in using data modeling tools like Erwin
- Excellent experience in using scheduling tools to automate batch jobs
- Hands on experience in using Apache SOLR/Lucene
- Expertise using SQLServer, SQL, queries,procedures, functions
- Hands on experience in App Development using Java, Hadoop, RDBMS and Linux shellscripting
- Oracle JAVA certified professional and certified in AWS Solutions Architect-Associate
- Strong experience in Extending Hive and Pig core functionality by writing custom UDFs
- Experience in Software Design, Development and Implementation of Client/Server WebApplications
- Excellent noledge of Object Oriented Design and Development
- Ability to work as team and individually on many cutting-edge technologieswith excellent management skills, business understanding and strong communication skills
Hadoop/Big Data: HDFS, MapReduce, Yarn, HBase, Pig, Hive, Sqoop, Flume,Oozie, Zookeeper,Splunk, Hortonworks, Cloudera
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
IDE’s: Eclipse, Net beans, IntelliJ,Pycharm
ETL: Talend, SSIS
Frameworks: MVC, Struts, Hibernate, Spring, Spring boot
Programming languages: C, C++, Java, Python, Scala, Linux shell scripts
Databases-: RDBMS (MySQL, DB2, MS-SQL Server, PostgreSQL), NoSQL (MongoDB, HBase, Cassandra)
Amazon Web Services: EMR, EC2, S3, RDS, Cloud Search, Redshift, Data PipelineLambda.: Web Servers Web Logic, Web Sphere, Apache Tomcat
Build Tools: Ant, Maven, Gradle, Akka Version Controller GIT and Tortoise SVN
Cloud: AWS, Microsoft Azure
Development Methodologies: Agile, Scrum, Waterfall
Sr. Spark / Scala Developer
Confidential, Minneapolis, MN
Roles & Responsibilities:
- Worked with Spark for improving performance and optimization of teh existing algorithms in Hadoopusing Spark Context, Spark-SQL, DataFrames, Pair RDD's
- Developed preprocessing job using SparkDataframes to transform JSON documents to flat file
- Loaded D-Stream data into Spark RDD and did in-memory data computation to generate output response
- Processing with Amazon EMR big data across a Hadoop cluster of virtual servers on AmazonElastic Compute Cloud (EC2) andAmazonSimple Storage Service (S3).
- Imported data from AWSS3 into SparkRDD, performed transformations and actions on RDD's.
- Worked on Big Data infrastructure for batch processing and real-time processing using Apache Spark
- Developed Apache Spark applications by using Scala for data processing from various streaming sources
- Processed teh Web server logs by developing Multi-Hop Flume agents by using AvroSink and loaded intoCassandra for further analysis, Extracted files from Cassandra through Flume
- Responsible for design and development of SparkSQL Scripts based on Functional Specifications
- Worked on teh large-scale HadoopYARN cluster for distributed data processing and analysis using Spark,Hive, and Cassandra
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDD's and Scala
- Implemented SparkScripts using Scala, SparkSQL to access hive tables to spark for faster processingof data.
- Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit
- Involved in creating Data Lake by extracting customer's data from various data sources to HDFS which includedata from Excel, databases, and log data from servers
- Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed it using Hive
- Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hivetable
- Experienced in writing real-time processing and core jobs using SparkStreaming with Kafka as a datapipeline system
- Extending HIVE/PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig
- Involved in loading data from restendpoints to Kafka producers and transferring teh data to Kafkabrokers
- Used Apache Kafka functionalities like distribution, partition, replicated commit log service for messaging
- Partitioning Datastreams using Kafka. Designed and configured Kafkacluster to accommodate heavythroughput.
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reportsfor teh BI team
- Used Apache Oozie for scheduling and managing multiple Hive Jobs. Knowledge of HCatalogfor Hadoop based storage management
- Migrated an existing on-premises application to Amazon Web Services (AWS) and used its services likeEC2 and S3 for small data sets processing and storage, experienced in maintaining teh Hadoop clusteronAWSEMR
- Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats like Text, Avro, Sequence, XML, JSON, and Parquet
- Generated various kinds of reports using Pentaho and Tableau based on Client specification
- Has come across new tools like Jenkins, Chef and RabbitMQ.
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint
Environment: Hadoop, MapReduce, HDFS, Yarn, Hive, Sqoop, Oozie, Spark, Scala, AWS, EC2, S3, EMR, Cassandra, Flume, Kafka, Pig, Linux, Shell Scripting
Sr. Spark / Bigdata Developer
Confidential - Worcester, MA
Roles & Responsibilities:
- Built scalable distributed Hadoop cluster running HortonworksDataPlatform (HDP 2.6)
- Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using SparkContext, Spark-SQL, PairRDD's
- Serializing JSON data and storing teh data into tables using SparkSQL
- SparkStreaming collects data from Kafka in near-real-time and performs necessary transformations and aggregation to build teh common learner data model and stores teh data in NoSQL store (HBase).
- Good Knowledge on Sparkframework on both batch and real-timedataprocessing
- Hands on experience in MLlib from Spark are used for predictive intelligence, customer segmentation and for smooth maintenance in Sparkstreaming
- Developing programs for Sparkstreaming which takes teh data from Kafka and pushes into different sources
- Loading teh data from teh different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Created different pigscripts& converted them as shell command to provide aliases for common operation for project business flow.
- Implemented Partitioning, Bucketing in Hive for better organization of teh data.
- Implemented various Hivequeries for Analysis and call tan from java client engine to run on different nodes.
- Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and tan imported into hive tables.
- Developed bash scripts to bring teh log files from FTP server and tan processing it to load into Hivetables.
- All teh bashscripts are scheduled using Resource Manager Scheduler.
- Developed Map Reduce programs for applying business rules on teh data.
- Developed a NiFi Workflow to pick up teh data from Data Lake as well as from server and send dat to Kafkabroker
- Involved in loading and transforming large sets of structured data from router location to EDW using an ApacheNiFi data pipeline flow
- Implemented Kafkaeventlog producer to produce teh logs into Kafkatopic which are utilized by ELK (Elastic Search, Log Stash, Kibana) stack to analyze teh logs produced by teh Hadoopcluster
- Did Implementation using Apache Kafka replacement for a more traditional message broker (JMS Solace) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
- Implemented receiver-based approach, here me worked on Spark streaming for linking with Streaming Context using java API and handle proper closing & waiting stages as well.
- Experience in Implementing RackTopologyscripts to theHadoopCluster.
- Implemented teh part to resolve issues related with old Hazel cast API Entry Processor.
- Used AkkaToolkit to perform few builds and used Akka with Scala
- Excellent noledge with Talend Administration console, Talend installation, using Context and global map variables in Talend
- Used dashboard tools like Tableau
- Has Knowledge about Splunkarchitecture and various components (Indexer, forwarder, search head, deployment server)
- Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly basis
Environment:HadoopHDP, Linux, MapReduce, HBase, HDFS, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Maven, Splunk, Open source technologies Apache Kafka, Apache Spark, Hazel cast, Git, Talend.
Confidential - Anoka, MN
- Dealing with data migrations from diversified databases into HDFS and Hive using Sqoop
- Written Java program ApacheTIKA to get teh Metadata and teh data of teh documents and perform cleansing and build clustersand save to MongoDB
- Implemented DynamicPartitions and Buckets in HIVE for efficient data access
- UtilizedHive to process huge amount of provider info
- Used ClouderaImpala as an SQL engine for processing teh data stored in HBase and HDFS.
- Applying automation jobs in Linuxshell
- Built an AWS-EC2 instance and Migrated data to Cloud
- Developed MapReduce jobs in java for datacleansing and pre-processing
- Moving data from DB2, Oracle Exadata to HDFS and vice-versa using SQOOP
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Creating S3buckets and managing policies for S3 buckets and Utilized S3 bucket and Glacier for Archival storage and backup on AWS.
- Experience with AWSLambda workflow implementation using python to interact with application deployed on EC2 instance and S3 bucket.
- Worked with different file formats and compression techniques to determine standards
- Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF’S to extend teh ETL functionality
- Developed user defined functions in Pig
- Analysing/Transforming data with Hive and Pig
- Developed hivequeries and UDFS to analyse/transform teh data in HDFS.
- Developed hivescripts for implementing control tables logic in HDFS
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE
- Developed Pigscripts and UDF’s as per teh Business logic
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis
- Wrote pythonscripts to parse XML documents and load teh data in database
- Enhanced existing module written in python scripts.
- Excellent noledge on Python Collections and Multi-Threading
- Designed and developed read lock capability in HDFS
- Implemented Hadoop Float equivalent to teh DB2 Decimal
- Involved in End to End implementation of ETL logic
- Effective coordination with offshore team and managed project deliverable on time
Environment:Hadoop Ecosystem (MapReduce, HDFS, YARN, Hive, Sqoop, Pig Latin, Zoo Keeper, Oozie) NoSQL Databases (HBase, MongoDB), MySQL, Hortonworks, JIRA, Linux Shell Scripting,Teradata and Eclipse, Java, python
Confidential - Jefferson, LA
Roles & Responsibilities:
- Hands on experience in loading data from UNIX file system and Teradata to HDFS
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on teh Hadoopcluster on teh Cloudera’s CDHdistribution.
- Developed PIGscripts for teh processing of semi-structured data using sorting, joins and Grouping teh data.
- Developed Java MapReduce programs on log data to transform into a structured way to find user location,age group, spending time
- Collected and aggregated large amounts of weblog data from different sources using ApacheFlume and stored teh data into HDFS for analysis
- Created HBase tables to store variable data formats coming from different portfolios Performed real-timeanalytics on HBase using JavaAPI and RestAPI.
- Used Docker containers in development environment
- Extracted files from Couch DB, MongoDB through Sqoop and placed in HDFS for processed
- Integrated Oozie with teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box(such as Map-Reduce, Pig, Hive, and Sqoop) as well as system-specific jobs (such as Java programs and shellscripts).
- Developed ETL using Hive, Oozie, shellscripts and Sqoop and Analyzed teh weblog data using teh HiveQL.
- Supported Data Analysts in running MapReduce Programs.
- Experienced with working on AvroData files using AvroSerialization system
Environment: Cloudera Distribution (CDH), HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java,Maven, Avro, Oozie, ETL and Unix Shell Scripting.
Roles & Responsibilities:
- Used SpringBoot configured Application Context files and performed database object mapping using Hibernate annotations.
- Used SpringDependencyInjection, to make application easy to test and integrate
- Designed and developed teh application using MVCArchitecture
- Teh businesslogic was implemented using SpringMVC and Hibernate
- Teh presentation layer was implemented using HTML, CSS and JSP
- Worked on serverside implementation using spring core, spring annotations navigation from presentation to other layers using SpringMVC and integrated spring with Hibernate using Hibernate template to implement persistent layer
- Implemented teh CRUD operations via SQLqueries to teh database MySQL
- Used RESTFUL client to interact with teh services by providing teh RESTFUL URL mapping
- Performed backend unit testing using Junit
- Wrote Queries, StoredProcedures and created tables using SQL2008
- Working on creating SOAP/Rest Services Java and Springs
- Developing teh (view models and controller) MVCactions method to fetch teh data from teh back-end services and send it as JSON objects to teh views
- Developed servlets to process update information
- Used JDBC for communicating with teh database
- Involved in UIUX, Improved teh core Website functionality by fixing broken links and scripting errors, Sending reports to all clients (Up to date history of server, Ticket Summary, Daily server reports)
Java /J2EE Develope
Roles & Responsibilities:
- Developed Servlets and JavaServerPages (JSP), to route teh submittals to teh EJB components. Java Scriptinghandled teh Front-end validations
- Designed and developed reconciliation module to generate invoices using PL/SQL, storeprocedures and Triggers.
- Involved in designing and developing dynamic web pages using HTML and JSP with Strutstag libraries.
- Used JPA to persistently store large amount of data into database.
- Implemented modules using JavaAPIs, Javacollection, Threads, XML, and integrating teh modules.
- Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
- Teh application is implemented using JSP and servlets are used for implementing Business logic.
- Developed utility and helper classes and Server-side Functionalities using servlets.
- Created DAO Classes and Written Various SQLqueries to perform DML Operations on teh data as per theirquirements.
- Used Log4j for External Configuration Files and debugging.
- Created Session Beans and controller Servlets for handling HTTP requests from JSP pages.
- EclipseIDE used for teh developing teh application.
- Setting up environment such as deploying application, maintaining Web server.
- Analyzing teh Defects and providing fix.
- Development of XML files using XPATH, XSLT, Parsing using both SAX and DOM parsers.
- Designed and developed XSL style sheets using XSLT to transform XML and display teh Customer Information on teh screen for teh user and also for processing
- Used JUnit for Testing Java Classes..
Work Environment: Java, J2EE, Oracle, RMI, ClearCase, JDBC, UNIX, Junit, Eclipse,Struts, XML, XSLT,XPATH, XHTML, CSS, HTTP