- Overall 8+ years of experience in designing and Building highly - scalable distributed systems using Apache Hadoop, Apache Spark and Java/J2EE.
- Very good hands-on in Spark Core, Spark-SQL, Spark streaming, Spark machine learning using Scala and Python programming language.
- Strong understanding of RDD operations in Apache Spark i.e. Transformations, Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimizing Broadcasts.
- In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
- Experience in submitting Apache Spark job and map reduce jobs to YARN.
- Good understanding of Driver, Executor Spark web UI.
- Migrated Python Machine learning modules to scalable, high performance and fault-tolerant distributed systems like Apache Spark.
- Strong experience in Spark SQL UDFs, Hive UDFs, Spark SQL Performance, Performance Tuning.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Hortonworks, Cloud era manager and Amazon Web Services (AWS) using S3 and running instances on EC2.
- Used Amazon EMR to perform the map reduce job on the cloud.
- Experience in real time processing using Apache Spark and Kafka.
- Used Zookeeper on a distributed HBase for cluster configuration and management.
- Had hands on experience in providing service using Amazon kinesis analytics to process streaming data.
- Good Knowledge and understanding of Hadoop Architecture and various components in Hadoop ecosystems - HDFS, Map Reduce, Sqoop and Hive.
- Good understanding of the map reduces framework architectures (MRV1 & YARN Architecture).
- Developed various Map Reduce applications to perform ETL workloads on Meta data and terabytes of data.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Experience in working with flume to load the log data from multiple sources directly into HDFS.
- Good Knowledge of importing data from Teradata using Sqoop.
- Good working knowledge in creating Hive tables and worked using HQL for data analysis to meet the business requirements.
- Experience in managing and reviewing Hadoop log files.
- Having good working experience of No SQL database like HBase, Cassandra and Mango DB.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Experience in fine-tuning Map reduces jobs for better scalability and performance.
- Experience in writing shell scripts to dump the shared data from landing zones to HDFS.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Worked with Avro Data Serialization system.
- Hands on experience in working with input file formats like orc, parquet, json, avro.
- Good expertise in coding with Python, Scala and Java.
- Expertise in Client Side designing and validations using HTML and Java Script.
- Well versed in Core Java, J2EE, Web Services (SOA), JDBC, Swings, MySQL, and DB2.
- Software development in Java Application Development, Client/Server Applications, and implementing application environment using MVC, J2EE, JDBC, JSP, Servlets, XML methodologies (XML, XSL, XSD), Web Services, Oracle, and Relational Databases.
- Exceptional ability to quickly master new concepts and capable of working in groups as well as independently.
- Good analytical, programming, problem solving and troubleshooting skills.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Excellent Written, Analytical Skills, Verbal Communication skills with a customer service oriented attitude and worked with the offshore team as onsite coordinator to provide the update on the daily basis.
Programming languages: Scala, Python and Java.
Big Data Frameworks: Apache spark, apache Hadoop, Hive, Cassandra, Mongo DB, Flume, Sqoop, Map Reduce, Cloudera.
Data visualization: Tableau
Big data distribution: Cloudera, Amazon EMR
Java/J2EE Technologies: Core java, Hibernate, Spring.
Operating Systems: Ubuntu, centos, MacOS, Windows.
Databases: Oracle10g, MySQL, Postgres.
Development Tools: Eclipse, PyCharm, Jupyter NotebookData bricks notebook, Intellij.
Development methodologies: Agile, Waterfall
Messaging Services: ActiveMQ, Kafka, JMS.
Version control Tools: GitHub, SVN.
Workflow scheduling system: Apache Oozie, Airflow
Confidential -Kansas city Missouri
- Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Hive, Sqoop, and Spark.
- Developed Spark code using Scala for faster processing of data.
- AGILE development methodology has been followed to develop the application.
- Installing and configuring a Hadoop Cluster on a different platform like Cloudera, Pivotal HD and AWS-EMR with other ecosystems like Sqoop, HBase, Hive and Spark.
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch -processing.
- Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
- Used Spark Data frames , Spark-SQL, and Spark MLLib extensively.
- Integrated Apache Storm with Kafka to perform web analytics.
- Uploaded click stream data from Kafka to Hdfs, HBase and Hive by integrating with Storm.
- Designed the ETL process and created the high-level design document including the logical data flows , source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
- Collected data using Spark Streaming and stored that data into AWS S3 bucket in near-real-time and performed necessary Transformations and Aggregations to build the data model and persists the data into HDFS.
- Worked on Talend ETL tool and used features like context variable and database components like input to oracle, output to oracle, tFile compare, tFile copy, to oracle close ETL components.
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
- Created, altered and deleted topics ( Kafka Queues ) when required with varying.
- Performance tuning using Partitioning, bucketing of IMPALA tables.
- Experience in NoSQL database such as HBase, Mongodb .
- Involved in cluster maintenance and monitoring .
- Involved in loading data from UNIX file system to HDFS.
- Created an e-mail notification service upon completion of job or the particular team which requested for the data.
- Worked on NOSQL databases which differ from classic relational databases.
- Successful in creating and implementing complex code changes.
- Experience in AWS EC2 , configuring the servers for Auto scaling and Elastic load balancing.
- Configuring EC2 instances in VPC network & managing security through IAM and Monitoring server's health through Cloud Watch.
Environment: Hadoopv2/Yarn-2.4, Spark, Amazon web services (AWS), MapReduce, Teradata 15.0, Hive, REST, Sqoop, Flume, Kafka, Impala, Mongodb.
Confidential - San Jose, California
- Every day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Worked with cloud services like AZURE and involved in ETL, Data Integration and Migration.
- Wrote AZURE POWERSHELL scripts to copy or move data from local file system to HDFS Blob storage.
- Implemented OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
- Wrote Lambda functions in python for AZURE which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
- Played a lead role in the development of Confidential Data Lake and in building Confidential Data Cube on Microsoft Azure HDINSIGHT cluster.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files .
- Designed, built, and deployed multitude applications utilizing almost all the AZURE stack, focusing on high-availability, fault tolerance, and auto-scaling .
- Designed and developed automation test scripts using Python.
- Azure Cloud Infrastructure design and implementation utilizing Azure Resource Manager (ARM) templates.
- Orchestrated hundreds of Sqoop scripts, python scripts, Hive queries using Oozie workflows and sub-workflows.
- Experience in ingesting incremental updates from structured ERP systems residing on Microsoft SQL server database on to Hadoop data platform using SQOOP .
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Worked on analyzing Hadoop cluster and different big data analytic tools including HBase, database and Sqoop.
- Creating and truncating HBase tables in hue and taking backup of submitter ID(s).
- Used Amazon EMR for map reduction jobs and test locally using Jenkins.
- Creating and managing Azure Web-App s and providing the access permission to Azure AD users .
- Involved in loading data from LINUX file system to HDFS .
- Experience in configuring the Storm in loading the data from MYSQL to HBASE using Java Message Service .
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
Environment: HDFS, Map Reduce Hive, Hue, AZURE, Flume, Oozie, Sqoop, Apache Hadoop, Spark, Python, Qlik, Horton Works, Ambari, Red Hat, MySQL and Oracle.
Confidential - St. Louis, MO
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Wrote reference architecture documents describing Horton works platform integration with other platforms (Informatica) and analytic applications (Excel, Tableau).
- Wrote sandbox tutorials that demonstrated the functionality of the Horton works platform; HDFS (Hadoop), Pig, Hive, HBase, and HCatalog. Topics included how to install and configure the Horton works ODBC drivers, refining data with the Horton works platform, and analyzing data with 3rd-party BI tools (Excel, Tableau).
- Installed virtual machines on Windows and Mac using Oracle Virtual Box and VMware. Installed and documented Horton works platform features on Windows 7, Windows Server 2012, and other operating systems.
- Managed delivery pipeline, tuned OS params, throttled SOA application, connection pool,
Mediator Worker Threads, Audit flow, JVM Heap garbage collection, security patches, and bug fixes.
- Developed server-side services using Java, spring, Web Services (SOAP, Restful, WSDL, JAXB, JAX-RPC), SOA (Service oriented architecture).
- Experience in administration, in4stalling, upgrading and managing CDH3, Pig, Hive & HBase
- Importing and exporting data into HDFS and Hive using Sqoop and Flume.
- Experienced in defining job flows.
- Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Participated in development/implementation of Cloudera Hadoop environment.
- Responsible for the system user interface creation.
- Developed and implemented GUI of the system (Java 2.0, Swing)
- Developed server-side business logic software modules using JDBC
- Proficient in database development MySQL.
- Associated in development in System management and out patience modules using JAVA, Swing.
- Developed Application based on J2EE using Hibernate, spring, JSP frameworks and SOAP/REST web services.
- Experience in preparing Unit Test cases prepared.
- Participated code review implemented by peers.
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Hive, Pig, Sqoop, Flume, Cloud era CDH3, Oozie, Oracle, MySQL, JAVA, Swings, JSP, JDBC, JUnit.
- Handled 3 modules Login, Register and Update Profile in this application (It uses waterfall methodology).
- Involved in changing in the look and feel (UI) of web pages as per requirement by modifying the JSPs and HTMLs.
- Involved in Code review and design review and documentation of design document.
- Involved in modifying the Java backend objects (EJBs, XMLs, and Servlets).
- Modified the DAOs as part of the enhancements.
- Modified the JDBC connection parameters in the application.
- All Java code development was done through Net Beans.
- Involved in resolving the fixes to the tickets at the time of production deployment.
- Maintained the database by purging the old data and inserting the new data (user information) on daily basis. It was done by running the UNIX scripts.
- Involved in the unit testing and UAT testing for each release.
- Maintained all the standards of GE while doing the code change.
Environment: J2EE, EJB1.1, XML, XSLT, WebSphere 4.0, Oracle 8i, JDBC, LDAP, Struts, PL/SQL, Toad, NetBeans, JSP, HTML .
- Entire SCADA Application was developed from scratch using Java Technologies due to the nature of complex requirement.
- Implemented SOA architecture for communications between the individual PLC’s and between PC and PLC using XML-RPC, this enables each device as a service.
- Implemented industry best practices to improve the quality of the product development.
- Provided a client/server solution using XML-RPC communication for scalability of the system.
- Provided reporting solutions with PDF export format.
- Provided complete multi-threaded communication platform for visualization, recipe download, configuration, setup, utilities, maintenance.
- Provided system security with task based permissions.
- Task based security privileges to users and roles.
- Implemented Alarms Web service, Messages Web service, Error Report Web Service.
- Experience in developing applications using Eclipse IDE and running test cases.
- Analyzed, created and proposed remediation measures to fix the bugs in the application.
- Developed action classes and configuration files for struts framework.
- Developed Oracle stored procedures and complex SQL.
- Created and maintained the PL/SQL procedures that run in the background freeing the current threads from database deadlocks.
- Tested the application thoroughly in Unit Testing phase using JUNIT.
Environment: Java, XML-RPC, Web services, Struts, JSP, Oracle, JUnit, JDBC, UML, Eclipse, PL/SQL, Core Java.