- Around 8 years of IT experience involving project development, implementation, deployment and maintenance using Big data Hadoop Ecosystem related technologies in Insurance, Health Care & Retail Industry Project sectors wif multiprogramming language expertise like Java, Python.
- 5+ years of Hadoop Developer experience in designing and implementing complete end - to-end Hadoop Infrastructure using HDFS, MapReduce, HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Storm, Oozie, and Flume.
- 2+ years of Java programming experience in developing web based applications and Client-Server technologies.
- In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce concepts.
- Strong experience creating real time data streaming solutions using Apache Spark Core, Spark SQL and Data Frames.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Experience ofPartitions, Bucketingconcepts in Hive and designed both Managed and Externaltables in Hive to optimize performance
- Collecting and aggregating a large amount of Log Data using Apache Flume and storing data in HDFS for further analysis.
- Job workflow scheduling and monitoring using tools like Oozie.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Comprehensive experience in building Web-based applications using J2EE Frame works like Spring, Hibernate, Struts and JMS.
- Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies.
- Transforming some existing programs into lambda architecture.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Experience in installation, configuration, support and monitoring of Hadoop clusters using Apache, Cloudera distributions and AWS.
- Experience in working wif various Cloudera distributions (CDH4/CDH5), and Hortonworks
- Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
- Experience in different layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive), Engineering (Jobs and Workflows).
- Good noledge on Amazon AWSconcepts like EMR & EC2 web services which provides fast and efficient processing of Big Data
- Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Joins and organizing data using Partitions and Buckets.
- Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
- Experienced in using Integrated Development environments like Eclipse, NetBeans and IntelliJ
- Migration from different databases (i.e. Oracle, DB2, Cassandra, MongoDB) to Hadoop.
- Generated ETL reports using Tableau and created statistics dashboards for Analytics.
- Familiarity wif common computing environment (e.g. Linux, Shell Scripting).
- Familiar wif Java virtual machine (JVM) and multi-threaded processing.
- Detailed understanding of Software Development Life Cycle (SDLC) and sound noledge of project implementation methodologies including Waterfall and Agile.
- Development experience wif Big Data/NoSQL platforms, such as MongoDB and Apache Cassandra.
- Worked and migrated RDMBS databases into different NoSQL database.
- Strong hands-onexperience wif DW platforms and databaseslike MS SQL Servers 2012 and 2008, Oracle 11g/10g/9i, MySQL, DB2 and Teradata.
- Experience in designing and coding web applications using Core Java & web Technologies- JSP, Servlets and JDBC.
- Extensive experience solving analytical problems using quantitative approaches using machine learning methods in R.
- Excellent noledge in Java and SQL in application development and deployment.
- Familiar wif data warehousing "fact" and "dim" table and star schema and combined wif Google Fusion tables for visualization.
- Good working experience in PySpark and Spark Sql.
- Experience in creating various database objects like tables, views, functions, and triggers using SQL.
- Good team player wif ability to solve problems, organize and prioritize multiple tasks.
- Excellent technical, communication, analytical and problem-solving skills and ability to get on well wif people including cross-cultural backgrounds and troubleshooting capabilities.
Hadoop/Big Data: MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Storm, Kafka, Flume,Zookeeper, Oozie, Impala.
Programming Languages: C, Core Java, PL/SQL, Python, R, Scala
Java/J2EE Technologies: Servlets, JSP, JDBC, Java Beans, REST Web services.
Development Tools: Eclipse, Net Beans, SVN, Git, Maven, SOAP UI, JMX explorer, SQL Developer, QTOAD.
Methodologies: Agile/Scrum/Kanban Board UML, Rational Unified Process and Waterfall.
Monitoring and Reporting: Ganglia, Nagios, Custom Shell scripts.
NoSQL Technologies: Accumulo, Cassandra, MongoDB, HBase
Frameworks: MVC, Struts, Hibernate, And Spring.
Scripting Languages: Unix Shell Scripting, SQL
Distributed plat forms: Hortonworks, Cloudera
Databases: Oracle 11g, MySQL, MS-SQL Server, Teradata.
Operating Systems: Windows XP/Vista/7/8,10, UNIX, Linux
Software Package: MS Office 2007/2010.
Web/ Application Servers: WebLogic, WebSphere, Apache Tomcat
Visualization: Tableau, Kibana and MS Excel
Confidential, Greenwood Village, CO
- Consult leadership/stakeholders to share design recommendations and thoughts to identify product and technical requirements, resolve technical problems and suggest Big Data based analytical solutions.
- Implement solutions for ingesting data from various sources and processing the Datasets utilizing Big Data technologies such as Hadoop, Hive, Kafka, Map Reduce Frameworks and Cassandra
- Installation and maintenance of Hadoop and spark cluster for both dev and production environment.
- Analyze source systems like Oracle, RDBMS database tables, perform analysis, data modelling from source to target mapping and build data pipelines to ingest data into Hadoop as per the business requirement.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Installing and configuring EC2 instances on Amazon Web Services (AWS) for establishing clusters on cloud
- Design and develop real time data streaming solutions using Apache Kafka build data pipelines to store Big data-sets into NoSQL databases like Cassandra.
- Creation of Cassandra keyspace and tables by using map function to store JSON records.
- Develop Kafka Producers and Consumers from scratch as per the business requirements.
- Implement a POC using Apache Kafka and Spark wif Java to parse Real time data coming from event logs and store into Cassandra tables to generate reports.
- Develop data pipeline using Sqoop to ingest billing and order events data from Oracle tables into Hive tables.
- Create Flume source, sink agents to ingest log files from SFTP sever into HDFS for analysis.
- Create Hive Tables as per requirement were internal or external tables defined wif appropriate static and dynamic partitions, intended for efficiency.
- Create visualization of Cassandra datasets through apache Zeppelin to study customers’ orders summary reports.
- Develop end-to-end data processing pipelines dat begin wif receiving data using distributed messaging systems Kafka through persistence of data into Cassandra
- Extraction of key values pairs from XML files using spark and store data into Cassandra tables.
- Implement Spark applications and Spark SQL which is responsible for the creation of RDDs and data frames of large datasets by storing in cache for faster querying and processing of data.
- Worked on Spark SQL, created data frames by loading data from Hive tables and created prep data and stored in AWS S3
- Optimize the performance of the Hive joins and queries by creating Partitions,Bucketing in Hive and Map Side joins for large data sets.
- Develop Spark SQL to load tables into HDFS to run select queries on top
- Scheduling jobs using Oozie actions like Shell action, Spark action and Hive action.
Environment: Hadoop, Hortonworks, Linux, Hive, MapReduce, HDFS, Kafka, Spark, Cassandra, Shell Scripting, Sqoop, Java, Maven, Spring Framework, Jira, Zeppelin, Oracle Database, AWS S3, EC2, Redshift.
Confidential, Englewood, CO
- Perform best practices for the complete software development life cycle including analysis, design, coding standards, code reviews, source control management and build processes
- Work collaboratively wif all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources
- Coordinate wif the Architects, Manager and Business team on current programming tasks.
- Document and demonstrate solutions by developing documentation, flowcharts, layouts, and code comments
- Producing detailed specifications and writing the program codes.
- Testing the code in controlled, real situations before deploying into production.
- Perform daily analysis on Teradata/Oracle source databases to implement ETL logic and data profiling.
- Importing data from different data sources like Teradata, Oracle into HDFS using Sqoop and performing transformations using Hive, Map Reduce and tan loading data into HDFS.
- Process data ingested into HDFS using Sqoop, custom HDFS adaptors and analyzed the data using Spark, Hive, and MapReduce produced summary results from Hadoopto downstream systems.
- Develop simple to complex MapReduce streaming jobs using Java language for processing and validating the data.
- Used Accumulo NoSQL database as a disk persistent layer for GridGain cache layer.
- Build applications using Maven and integrate wif Continuous Integration servers like Jenkins.
- Design and develop REST APIs dat allow sophisticated, effective and low cost application Integration.
- Create/ModifyShellscripts for scheduling data cleansing scripts and ETL loading process.
- Solved performance issuesin Hive, Pig scripts wif understanding of Joins, Group and aggregation, and how does it translate to MapReduce jobs.
- CreateSqoop jobs wif incremental loadto populate Hive External tables wif partitions and bucketing enabled.
- Implement ETL logic by solving fail over scenarios to ingest Data from external sources through SFTP servers into HDFS .
- Transferred the data using Informatica tool from AWS S3 to AWSRedshift. Involved in file movements between HDFS and AWSS3.
- Developed end-to-end data processing pipelines dat begin wif receiving data using distributed messaging systems Kafka through persistence of data into HBase.
- Develop Spark applications to perform all the data transformations on User behavioral data coming from multiple sources.
- Used datastax Spark-cassandra connector to built data ingest pipeline from cassandra to Hadoop paltform
- Create Hive queries, which help market analysts to spot emerging trends and promotions by comparing fresh data wif reference tables and historical metrics.
- Create components like Hive UDFs for missing functionality in HIVE for analytics.
- Work on Hadoop cluster wif Active directory and Kerberos implemented to establish SSO connection wif applications and perform various operations.
- Work on various performance optimizations like using distributed cache for small datasets, Partition,Bucketing in Hive and Map Side joins.
- Schedule jobs using UC4 automation framework to run daily, weekly and monthly basis.
- Design and develop Map Reduce jobs to process data coming in different file formats like XML, CSV, flat files and JSON.
- Implement Elastic search instance to read log files from servers and generate reports visualize data on Kibana Dashboard.
- Monitor the applications health and performance using Nagios system. Provide production support and troubleshoot the requests from end-users.
- Be the part of triage call to handle and solve defect reported by Admin team and QA team.
Environment: Hadoop, Hortonworks, Linux, Hive, MapReduce, HDFS, Hbase, Pig, Shell Scripting, Sqoop, Hue, Ambari, Yarn, Tez, Accumulo, Gridgain, AWS S3, EC2, Python, Git, Kafka, Storm, Apache Spark, Elastic Search, Kibana, Kerberos, Nagios, Toad for Oracel, Teradata SQL Developer.
Confidential, Chevy Chase, MD
- Devised and lead the implementation of the next generation architecture for more efficient data ingestion and processing.
- Worked wif highlyunstructured and semi structured data of 90 TBin size wif replication factor of 3.
- Developed simple to complex Map Reduce streaming jobs using Java language for processing and validating the data.
- Developed data pipeline using Map Reduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
- Extensive experience in writingPig scripts to transform raw datafrom several data sources into forming baseline data.
- Created Hive queries dat helped market analysts spot emerging trends by comparing fresh data wif EDW reference tables and historical metrics.
- Enabled speedy reviews and first-mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and tan loading data into HDFS.
- Provided design recommendations and thought leadership to sponsors/stakeholders dat improved review processes and resolved technical problems and suggested some solution translation via lambda architecture.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Good working noledge of Amazon Web Service components likeEC2, EMR, S3etc.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Solved performance issuesin Hive and Pig scripts wif understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Created and workedSqoop jobs wif incremental loadto populate Hive External tables.
- Real time streaming the data using Spark wif Kafka.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Implemented Spark using Python (pySpark) and Spark SQL for faster testing and processing of data.
- Experience in developing regression models on R for the statistical analysis.
- Experience in writing R or Python code using current best practices, including reproducible research and web-based data visualization.
- Identify concurrent job workloads dat may affect or affected by failures or bottlenecks.
- Developed some utility helper classes to get data from HBase tables.
- Scheduled and executed workflows in Oozie to run Hive and Pig jobs.
- Professional experience wif NoSQL Hbase solutions to solve real world scaling problems.
- Attending daily status calls to follow scrum process to complete each user story wifin the timeline.
- Implemented Cluster for NoSQL tools Cassandra, MongoDB as a part of POC to address HBase limitations.
- Worked on Implementation of a toolkit dat abstracted Solr & ElasticSearch.
- Viewing various aspect of a cluster using Cloudera Manager.
Environment: Hadoop, Linux, CDH4, MapReduce, HDFS, Hbase, Hive, Pig, Shell Scripting, Sqoop, Python, Java 7, MySQL, NoSQL, Eclipse, Oracle 11g, Maven, Log4j, Git, Kafka, Storm, Apache Spark, Elastic Search, Solr, Datameer.
Confidential, Conway, AR
- Loading the data from the different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Created different pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.
- Implemented various Hive queries for Analysis and call tan from java client engine to run on different nodes.
- Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and tan imported into hive tables.
- Involved in End to End implementation of ETL logic.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Developed bash scripts to bring the log files from FTP server and tan processing it to load into Hive tables.
- Developed Map Reduce programs for applying business rules to the data.
- Did Implementation using Apache Kafka replacement for a more traditional message broker (JMS Solace) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Implemented receiver based approach, here I worked on Spark streaming for linking wif Streaming Context using java API and handle proper closing & waiting for stages as well.
- Maintaining Autantication module to support Kerberos.
- Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
- Participated wif the admin team in designing and upgrading CDH 3 to HDP 4.
- Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit.
- Enhanced existing module written in python scripts.
- Used dashboard tools like Tableau.
Environment: Hadoop, Linux, MapReduce, HDFS, Hbase, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Oracle 10g, Maven, Open source technologies Apache Kafka, Apache Spark, ETL, Hazelcast, Git, Mockito, python.
Confidential, San Francisco, CA
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Frequent interactions wif Business partners.
- Designed and developed a Medicare-Medicaid claims system using Model-driven architecture on a customized framework built on Spring.
- Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
- Imported trading and derivatives data in Hadoop Distributed File System and Eco System (MapReduce, Pig, Hive, Sqoop).
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Created tables in HBase and loading data into HBase tables.
- Developed scripts to load data from HBase to Hive Meta store and perform Map Reduce jobs.
- Was part of an activity to setup Hadoop ecosystem at dev & QA Environment.
- Managed and reviewed Hadoop Log files.
- Responsible writing PIG Script and Hive queries for data processing
- Running Sqoop for importing data from Oracle & Other Database.
- Creation of shell script to collect raw logs from different machines.
- Created Partition in a hive as static and dynamic.
- Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance wif Hive QL queries.
- Defined some PIG UDF for some financial functions such as swap, hedging, Speculation and arbitrage
- Coded many MapReduce program to process unstructured logs file.
- Worked on Import and export data into HDFS and Hive using Sqoop.
- Used different data formats (Text format and Avro format) while loading the data into HDFS.
- Used parameterize pig script and optimized script using illustrate and explain.
- Involved in the process of configuring HA, Kerberos security issues and name node failure restoration activity time to time as a part of zero downtime.
- Implemented FAIR Scheduler as well.
Environment: Hadoop, Linux, MapReduce, HDFS, Hbase, Hive, Pig, Shell Scripting, Sqoop, CDH Distribution, Windows, Linux, Java 6, Eclipse, Ant, Log4j and Junit
Confidential, Boston, MA
- Write design document based on requirements from MMSEA user guide.
- Performed requirement gathering, design, coding, testing, implementation and deployment.
- Worked on modeling of Dialog process, Business Processes and coding Business Objects, Query Mapper and JUnit files.
- Involved in the design and creation of Class diagrams, Sequence diagrams and Activity Diagrams using UML models
- Created the Business Objects methods using Java and integrating the activity diagrams.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Worked in web services using SOAP, WSDL.
- Wrote Query Mappers and MQ Experience in JUnit Test Cases.
- Managed software configuration using ClearCase and SVN.
- Design, develop and test features and enhancements.
- Performed error rate analysis of production issues and technical errors.
- Developed test environment for testing all the Web Service exposed as part of the core module and their integration wif partner services in Integration test.
- Analyze user requirement document and develop test plan, which includes test objectives, test strategies, test environment, and test priorities.
- Responsible for performing end-to-end system testing of application writing JUnit test cases
- Perform Functional testing, Performance testing, Integration testing, Regression testing, Smoke testing and User Acceptance Testing (UAT).
- Converted Complex SQL queries running at mainframes into pig and Hive as a part of a migration from mainframes into Hadoop cluster.
- Involved in various SDLC phases like Design, Development and Testing.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Used various CoreJavaconcepts such as Exception Handling, Collection APIs to implement various features and enhancements.
- Developed server side components servlets for the application.
- Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on a Web Sphere application server.
- Used automated test scripts and tools to test the application in various phases. Coordinated wif Quality Control teams to fix issues dat were identified
- Implemented Hibernate ORM to Map relational data directly tojavaobjects
- Worked wif Complex SQL queries, Functions and Stored Procedures.
- Involved in developing spring web MVC framework for portals application.
- Implemented the logging mechanism using log4j framework.
- Developed REST API, Web Services.
- Wrote test cases in JUnit for unit testing of classes.
- Used Maven to build the J2EE application.
- Used SVN to track and maintain the different version of the application.
- Involved in maintenance of different applications wif onshore team.
- Good working experience in Tepestry processing claims.
- Working experience wif professional billing claims.
Environment: Java, Spring Framework, Struts, Hibernate, RAD, SVN, Maven, Web Sphere Application Server, Web Services, Oracle Database 11g, IBM MQ, JMS, HTML, Java script, XML, CSS, REST API.