- Big Data developer with over 8 years of professional IT experience, which includes 4 years’ experience in the field of Big Data.
- Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera, Hortonworks , MAPR distribution good knowledge on Amazon’s EMR.
- In depth experience in using various Big Data Ecosystem tools like MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm, Kafka, Oozie, Elastic search, HBase, and Zookeeper.
- Extensive knowledge of Hadoop architecture and its components.
- Good knowledge in installing, configuring, monitoring and troubleshooting Hadoop cluster and its eco - system components.
- Exposure to Data Lake Implementation using Spark.
- Developed Data pipe lines and applied business logics using Apache Spark.
- Well-versed in spark components like Spark SQL, MLib, Spark streaming and GraphX.
- Extensively worked on Spark streaming and Apache Kafka to fetch and transform live stream data.
- Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Expertise in performing real time analytics on big data using HBase and Cassandra .
- Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka.
- Handled importing data from RDBMS into HDFS using Sqoop and vice-versa.
- Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS.
- Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
- Hands-on experience in tools like Oozie and Airflow to orchestrate jobs.
- Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Expertise in Cluster management and configuring Cassandra Database.
- Great familiarity with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
- Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Worked on different compression codecs (ZIO, SNAPPY, GZIP) and file formats (ORC, AVRO, TEXTFILE, PARQUET)
- Experience in practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
- Built AWS secured solutions by creating VPC with public and private subnets.
- Worked on data warehousing and ETL tools like Informatica, Talend, and Pentaho.
- Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Expertise working in JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
- Developed web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Experience working with Spring and Hibernate frameworks for JAVA.
- Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij.
- Excelled in using version control tools like PVCS, SVN, VSS and GIT.
- Developed stored procedures and queries using PL/SQL.
- Development experience in RDBMS like Oracle, MS SQL Server, Teradata, and MYSQL .
- Experience with best practices of Web services development and Integration (both REST and SOAP ).
- Experienced in using build tools like Ant, Gradle, SBT, Maven to build and deploy applications into the server.
- Knowledge in Unified Modeling Language (UML) and expertise in Object Oriented Analysis and Design (OOAD) and knowledge
- Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies.
- Adept knowledge and working experience in Test-Driven Development (TDD) methodologies.
- Knowledge in Creating dashboards and data visualizations using Tableau to provide business insights
- Excellent communication skills, interpersonal skills, problem-solving skills and very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
Big Data Technologies: Hadoop, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Storm, Flume, Spark, Apache Kafka, Zookeeper, Solr, Ambari, Oozie
NO SQL Databases: HBase, Cassandra, MongoDB, Redshift, Redis
Languages: C, C++, Java, Scala, Python, HTML, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB
Application Servers: WebSphere, WebLogic, JBoss, Tomcat
Cloud Computing Tools: Amazon AWS, (S3, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch), Google Cloud
Databases: Oracle 10g/11g, Microsoft SQL Server, MySQL, DB2
Build Tools: Jenkins, Maven, ANT
Business Intelligence Tools: Tableau, Splunk
Development Tools: Eclipse, IntelliJ, Microsoft SQL Studio, Toad, NetBeans, PyCharm
Development Methodologies: Agile, Waterfall
Big Data/Spark Developer
Confidential, Franklin, TN
- Acquired, analyzed and documented business requirement as a part of development team.
- Developed numerous data models for data migration from Oracle DB to Cassandra.
- Extensively worked with Scala / Spark SQL for data cleansing and generating Data Frames to transform them into row DF’s to populate the aggregate tables in Cassandra.
- Adept at developing generic Spark-Scala methods for transformations and designing schema for rows.
- Adept at writing efficient Spark-Scala code to generate aggregation functions on Data Frames according to business logic.
- Experienced in using Data Stax Spark Connector which is used to store the data in Cassandra database from Spark.
- Worked extensively with Oracle DB and developed sqoop jobs for data ingestion into NoSQL database Cassandra.
- Extracted Real time feed using Kafka and Spark Streaming and converted it to RDD and processed data into Data Frame to save the data as Parquet format in HDFS.
- Worked on design, optimization, multi-datacenter replication, scalability, security, and monitoring of Kafka infrastructure.
- Assisted in designing and implementing big data solutions integrating with Java applications (messaging, web services integration, stream processing).
- Worked closely with architects to design data models and coding optimizations to build a generic data transformation framework (not client specific, can work on various client implementations.) using kafka streams “Streams API” (Kstream, katable, GlobalKTable) and “KAFKA Connect API”
- Successfully delivered a transformation framework which can work on various transformations and be an end to end solution for a data transfortation project using a topology that is integrated with both Streams DSL and Processor API.
- Integrated Schema registry (confluents version) to KStreams Porject to check schema compatablity for kafka and for managing AVRO schemas.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud ( EC2 ) and Amazon Simple Storage Service (S3).
- Co-ordinated with off-shore team members to write and generate test scripts, test cases for numerous user stories.
- Communicate regularly with business and I.T leardership.
Environment: HDFS, Spark, Kafka, Hive, Pig, Hbase, Cassandra, Java, Scala, Maven.
Sr. Big Data/Spark Developer
Confidential, Addison, TX
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Used Sqoop to import data from Relational Databases like MySQL, Oracle.
- Involved in importing structured and unstructured data into HDFS.
- Responsible for fetching real time data using Kafka and processing using Spark and Scala.
- Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
- Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
- Migrated Map Reduce programs into Spark transformations using Spark and Scala.
- Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Experienced with Spark Context, Spark-SQL,.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Worked on loading AVRO/PARQUET/TXT files in Spark Framework using Java/Scala language and created Spark Data frame and RDD to process the data and save the file in parquet format in HDFS to load into fact table using ORC Reader.
- Good knowledge in setting up batch intervals, split intervals and window intervals in Spark Streaming.
- Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
- Involved in Data Querying and Summarization using Hive and Pig and created UDF’s, UDAF’s and UDTF’s.
- Implemented Sqoop jobs for large data exchanges between RDBMS and /Hive clusters.
- Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
- Knowledge on MLLib (Machine Learning Library) framework for auto suggestions.
- Developed traits and case classes etc in Scala.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Experienced in loading the real-time data to NoSQL database like Cassandra.
- Involved in NoSQL (Datastax Cassandra ) database design, integration, implementation, written scripts and invoked them using CQLSH .
- Well versed in using Data Manipulations, Compactions, tombstones in Cassandra.
- Experience in retrieving the data present in Cassandra cluster by running queries in CQL (Cassandra Query Language).
- Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.
- Well versed in using of Elastic Load Balancer for Autoscaling in EC2 servers.
- Configured work flows that involves Hadoop actions using Oozie.
- Used Python for pattern matching in build logs to format warnings and errors.
Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Solr, Cassandra, Cloudera, Linux.
Confidential, Carlsbad, CA
- Involved in review of functional and non-functional requirements (NFR’s).
- Responsible for Collection and aggregation of large amounts of data from various sources and ingested into Hadoop file system (HDFS) using Sqoop and Flume , the data was transformed to business use cases using Pig and Hive.
- Collected and aggregated large amounts of weblogs and unstructured data from different sources such as web servers, network devices using Apache Flume and stored the data into HDFS for analysis.
- Developed and maintained data integration programs in RDBMS and Hadoop environment with both RDBMS and NoSQL data stores for data access and analysis
- Responsible for coding MapReduce program to develop multiple Map Reduce jobs in Java for data cleaning and processing.
- Responsible for testing and debugging the Map Reduce programs.
- Experienced in implementing Map Reduce programs to handle semi/unstructured data like json, XML, Avro data files and sequence files for log files.
- Worked on importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Developed and implemented map reduce jobs to support distributed processing using Java, Hive and Apache Pig.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
- Used Spark-SQL to Load data into Hive tables and Written queries to fetch data from these tables.
- Developed Pig scripts and UDF's as per the Business logic.
- Used Pig to import semi-structured data from Avro files to make serialization faster.
- Used Oozie work flows and Java schedulers to manage and schedule jobs on a Hadoop cluster.
- Indexed documents using Elastic Search.
- Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection,
- Implemented transformations and data quality checks using Flume Interceptor.
- Experience in Working with MongoDB for distributed storage and processing.
- Responsible for using Flume sink to remove the data from Flume channel and to deposit in MongoDB.
- Implemented collections & Aggregation Frameworks in MongoDB.
- Involved in maintaining Hadoop clusters using the Nagios server.
- Configured Oozie workflow engine to automate Map/Reduce jobs.
- Collaborated with Database, Network, application and BI teams to ensure data quality and availability.
- Good experience in using python Scripts to handle data manipulation.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Experienced in using agile approaches including Test-Driven Development, Extreme Programming, and Agile Scrum.
Environment: Hortonworks HDP, Hadoop, Spark, Flume, Elastic Search, AWS, EC2, S3, Pig, Hive, Python, MapReduce, HDFS.
Confidential - Peoria, IL
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive. HBase and MapReduce
- Extracted data of everyday transaction of customers from DB2 and export to Hive and setup Online analytical processing
- Installed and configured Hadoop, MapReduce, and HDFS clusters
- Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Loaded the structured data which was resulted from MapReduce jobs into Hive tables.
- Analyzed user request patterns and implemented various performance optimization measures including but not limited to implementing partitions and buckets in HiveQL.
- Identified issues on behavioral patterns and analyzed the logs using Hive queries.
- Analyze and transform stored data by writing MapReduce or Pig jobs based on business requirements
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and import to HDFS
- Using Oozie , developed workflow to automate the tasks of loading the data into HDFS and pre-process with Pig scripts
- Integrated Map Reduce with HBase to import bulk data using MR programs
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
- Developed data pipeline using Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
- Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers, using tools like SQL Profiler and Database Tuning Advisor (DTA)
- Installed a cluster, commissioned & decommissioned data node, performed name node recovery, capacity planning, and slots configuration adhering to business requirements
Environment: HDFS, Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Talend, HiveQL, Java, Maven, Avro, Eclipse and Shell Scripting.
- Developing rules based on different state policy using SpringMVC, iBatis ORM, spring web flow, JSP, JSTL, Oracle, MSSQL, SOA, XML, XSD, JSON, AJAX, Log4j
- Gathered requirements, developed, implemented, tested and deployed enterprise integration patterns (EIP) based applications using Apache Camel, JBoss Fuse
- Developed service classes, domain/DAOs, and controllers using JAVA/J2EE technologies
- Designed and developed using web service framework - Apache CX
- Worked on Active MQ messaging service for integration
- Worked with SQL queries to store and retrieve the data in MS SQL server
- Performed unit testing using JUnit
- Worked on continuous integration using Jenkins/Hudson
- Participated in all phases of development life cycle including analysis, design, development, testing, code reviews and documentations as needed.
- Involved in configuring Struts, Tiles and developing the configuration files
- Used ECLIPSE as IDE, MAVEN for build management, JIRA for issue tracking, CONFLUENCE for documentation purpose, GIT for version control, ARC (Advanced Rest Client) for endpoint testing, CRUCIBLE for code review and SQL Developer as DB client.
Environment:: Spring Framework, Spring MVC, spring web flow, JSP, JSTL, SOAP UI, rating Engine, IBM Rational Team, Oracle 11g, XML, JSON, Ajax, HTML, CSS, IBM WebSphere Application Server, RAD with sub-eclipse, jenkins, maven, SOA, SonarQube, Log4j, Java, JUnit
- Involved in gathering business requirements, analyzing the project and created UML diagrams such as Use Cases, Class Diagrams, Sequence Diagrams and flowcharts for the optimization Module using Microsoft Visio
- Configured faces-config.xml for the page navigation rules and created managed and backing beans for the Optimization module.
- Developing Enterprise Application using SpringMVC, JSP, MySql
- Working on developing client-side Web Services components using Jax-Ws technologies
- Extensively worked on JUnit for testing the application code of server-client data transferring
- Developed and enhanced products in design and in alignment with business objectives
- Used SVN as a repository for managing/deploying application code
- Involved in the system integration and user acceptance tests successfully
- Developed front end using JSTL, JSP, HTML, and Java Script
- Used XML to maintain the Queries, JSP page mapping, Bean Mapping etc.
- Used Oracle 10g as the backend database and written PL/SQL scripts.
- Maintained and modified system based on user feedbacks using the OO concepts
- Implemented database transactions using Spring AOP & Java EE CDI capability
- Enriched organization reputation via fulfilling requests and exploring opportunities
- Business Analysis, Reporting Service and Integrate to Sage Accpac (ERP)
- Developing new and maintaining existing functionality using SPRING MVC, Hibernate
- Developed test cases for integration testing using JUnit
- Creating new and maintaining existing web pages build in JSP, Servlet .
Environment: Java, SpringMVC, Hibernate, MSSQL, JSP, Servlet, JDBC, ODBC, JSF, Servlet, NetBeans, GlassFish, Spring, Oracle, MySQL, Sybase, Eclipse, Tomcat, WebLogic Server