- 8+ years of overall IT experience in a variety of industries, which includes hands on experience of 4+ years in Big Data technologies and designing and implementing Map Reduce tasks.
- Expertize with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experience in designing and developing POCs in Spark using Scala to compare the performance of spark with Hive and SQL/Oracle.
- Proficient in implementing HBase and Spark SQL.
- Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa according to client's requirement.
- Experience in data analysis using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Strong understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
- Proficient in designing Rowkeys & Schema Design for NoSQL Databases.
- Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
- Proficient with Cluster management and configuring Cassandra Database.
- Extensive Experience on importing and exporting data using Flume and Kafka.
- Experience in working with flume to load the log data from multiple sources directly into HDFS
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data
- Proficient in coding of optimized Teradata batch processing scripts for data transformation, aggregation and load using BTEQ.
- Extensive Experience on Teradata database, analyzing business needs of clients, Developed and performed Data Integration on top of Hadoop using Talend
- Involved in building many comparison graphs and a comparison matrix with all the details listed according to the requirement using Talend open studio
- Strong experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Good understanding in writing Python Scripts.
- Experience in working with BI team and transform bigdata requirements into Hadoop centric technologies.
- Strong Experience of Data Warehousing ETL concepts using Informatics Power Center, OLAP, OLTP and AutoSys.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Good understanding and working experience on Cloud based architectures.
- Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
- Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Experience working with Spring and Hibernates frameworks in JAVA.
- Good Experience and Expertise in Oracle ORMB and Stored procedures concepts.
- Experience in Performance Tuning and Query Optimization.
- Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and CVS.
BigData Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Hbase, Scala, Spark, Apache Kafka, Cassandra & MongoDB, Solr, Ambari, Ab initio, Akka Framework
Database: Oracle 10g/11g, PL/SQL, MySQL, MS SQL Server 2012
SQL Server Tools: Enterprise Manager, SQL Profiler, Query Analyser, SQL Server 2008,SQL Server 2005 Management Studio, DTS, SSIS, SSRS, SSAS
Language: C, C++, Java, Python
Development Methodologies: Agile, Waterfall
Testing: Junit, Selenium Web Driver
ETL Tools: Talend Open Studio, Pentaho, Tableau
IDE Tools: Eclipse, NetBeans, Intellij
Modelling Tools: Rational Rose, StarUML, Visual paradigm for UML
Architecture: Relational DBMS, Client-Server Architecture, OLAP, OLTP
Cloud Platforms: AWS Cloud, Google Cloud
Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X
Java Hadoop Developer
- Worked on IOTs and ThingSpace Platform the Verizon’s build-in authentication service for accessing information and development within intranet
- Beside worked on Couchbase DB for maintaining the applications stack that are built for COHO product
- Worked with Docker Container’s, have Strong knowledge on building images and composing the services.
- Strong experience working with Amazon AWS for accessing Hadoop cluster components.
- Solid understanding of Cloud and Open source technologies - AWS, Docker, Elastic Search, Git, Stash
- Elastic Search/Scala / AKKA streaming implementation, Deployed and maintained multi-node Dev and Test Kafka Clusters
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself. Managed and reviewed Hadoop log files
- Strong working experience on Cassandra for retrieving data from Cassandra clusters to run queries.
- Responsible for developing services that integrates two different environments while sharing a common connectivity using simulator that built using Java. strong knowledge and work experience in building java services like REST and Soap APIs.
- Developed Dashboards using splunk to find loggings to rectify the problems and resolve it as quickest as possible
- Strong working knowledge on deployment tools like Mesos and Marathon
Environment: Hadoop YARN, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Cassandra, Java APIs, Splunk, Docker, Mesos & Marathon, Docker, ReadyAPI tool for testing java web services
Hadoop Scala/Spark Developer
- Worked on Cluster size of nodes.
- Responsible for building scalable distributed data solutions using Hadoop
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
- Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop; And Developed enterprise application using Scala as well
- Expertise in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Experience and hands-on knowledge in Akka and LIFT Framework.
- Used PostgreSQL and No-SQL database and integrated with Hadoop to develop datasets on HDFS
- Involved in creating partitioned Hive tables, and loading and analyzing data using hive queries, Implemented Partitioning and bucketing in Hive.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data. Experience designing, reviewing, implementing and optimizing data transformation processes in the Hadoop and Talend /Informatica ecosystems.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Coordinated with admins and Sr. Technical staff for migrating Terradata to Hadoop and Ab Initio to Hadoop as well
- Configured Hadoop clusters and coordinated with BigData Admins for cluster maintenance.
Environment: Hadoop YARN, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux.
Confidential, Irving TX
- Detailed Understanding on existing build system, Tools related for information of various products and releases and test results information
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig.
- Developed UDF’s to provide custom hive and pig capabilities using SOAP/RESTful services.
- Performed transformations, cleaning and filtering on imported data using HIVE, Map Reduce and load final data into HDFS
- Built a mechanism for talend, automatically moving the existing proprietary binary format data files to HDFS using a service called Ingestion service.
- Performed Scala, Data transformations in Scala, HIVE and used partitions, buckets for performance improvements.
- Written custom Input format and record reader classes for reading and processing the binary format in MapReduce.
- Written Custom writable classes for Hadoop serialization and De serialization of Time series tuples.
- Implemented Custom File loader for Pig so that we can query directly on the large Data files such as build logs
- Used Python for pattern matching in build logs to format errors and warnings
- Developed Pig Latin scripts & Shell scrip for validating the different query modes in Historian.
- Created Hive external tables on the map reduce output before partitioning; bucketing is applied on top of it.
- Used SOLR for database integration to SQL SERVER .
- Monitoring clusters to provide reporting using SOLR .
- Improved the Performance by Scala, tuning of HIVE and map reduce using talend, ActiveMQ and JBoss.
- Developed Daily Test engine using Python for continuous tests.
- Expertise on performing cloudera operations on HDFS data .
- Used Shell scripting for Jenkins job automation with Talend.
- Building a custom calculation engine which can be programmed according to user needs.
- Ingestion of data into Hadoop using Shell scripting for Scrum, Elastic Sqoop and apply data transformations and using Pig and HIVE.
- Research, Scrum, evaluate and utilize new technologies/tools/frameworks around Hadoop eco-system
Environment: Apache Hadoop, Hive, Scala, PIG, HDFS, Cloudera, Java Map-Reduce, Core Java, Python, Maven, GIT, Jenkins, UNIX, MYSQL, Eclipse, Oozie, Sqoop, Flume, Solr, Oracle, My SQL and CDH4.X.
Confidential, Pasadena, CA
- Create, validate and maintain scripts to load data using Sqoop manually.
- Create Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly. Worked on reading multiple data formats on HDFS using.
- Involved in converting Hive/SQL queries transformation.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata .
- Expertise in RDBMS, database Normalization and Denormalization concepts and principles.
- Strong experience in Creating Database Objects such as Tables, Views, Functions, Stored Procedures, Indexes, Triggers, Cursors in Teradata.
- Strong skills in coding and debugging Teradata utilities like Fast Load, Fast Export, MultiLoad and Tpump for Teradata ETL processing huge volumes of data throughput.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
- Analyzed the SQL scripts and designed the solution to implement and Running reports in Pig and Hive .
- Develop, validate and maintain Hive QL queries.
- Access to MongoDB and Cassandra to perform operations on Cassandra as well as MongoDB clusters.
- Create, validate and maintain scripts to load data from and into tables in SQL Server 2012.
- Writing Map Reduce jobs using Python Scripting as well as Java Programming Languages.
- Fetch data to/from HBase using Map Reduce jobs tasks.
- Running reports in Pig and Hive Queries .
- Analyzing data with Hive, Pig.
- Designed Hive table s to load data to and from externa l tables.
- Wrote shell scripting to load data across servers.
- Importing data from MySQL database to HiveQL using Sqoop.
- Designed Hive tables to load data to and from external files.
- Wrote and Implemented Apache PIG scripts to load data from and to store data into Hive
Environment: s: Hadoop Stack (Hive, PIG, HCatlog, Sqoop, Oozie), Qlik view, Linux, SQL Server 2010, Bit Bucket, JAVA, Python.
- Develop the application using Spring as a front-end architecture and Hibernate as a data access layer, WebLogic as an application server and Oracle as a Database.
- Designed and developed the system components using Agile software methodology.
- Created Spring Controllers and Integrated with Business Components and View Components.
- Experience and used Jenkins as well as Maven
- Developed Spring and Hibernate data layer components for the application.
- Involved in databases updates and DDL Creation.
- Developed Restful Web Services for accessing Ordering information.
- Helped the UI team to integrate using Spring and RESTFUL Services.
- Unit test cases are created using JUNIT testing framework.
- Involved in deploying the application on WebLogic server.
- SVN is used for version control.
- Coordination with various team including support and test teams.
Environment: Java, J2EE (Servlets, JSP, JDBC), Spring, Hibernate, XML, Web Service, Oracle SQL/PLSQL, Jenkins, Maven
Java Backend Developer
- Developed bulk loading module, consumer specific access control model and search functionality for the catalog server in our product.
- Responsible for designing and developing catalog server UI using XML/XSLT
- Developed shopping cart functionality (Back end and UI) for the catalog.
- Responsible for meeting the specific performance goals for the catalog server put forth by the customer.
- Deployed the application using Tomcat webserver .
- Involved in Designing the Database Schema and writing the complex SQL queries.
- Accessed the database using JDBC.
- Used Oracle database for the application.
- Real Expertize and working knowledge on Oracle ORMB and performance tuning.
- Extensively involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using JDBC.