SUMMARY:
- Around 8 years of experience in several fields in IT including 4 years of experience in bigdata Ecosystem related technologies.
- Expertise with tools in Hadoop Ecosystem including HDFS, MapReduce, Hive, Sqoop, Pig, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Worked on major Hadoop distributions like Cloudera, MapR and Horton Works.
- Experience in processing structured, Semi - Structured and Unstructured Data using different tools and frameworks in Hadoop Ecosystem.
- Have good knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Analysis on integrating Kibana with Elastic Search.
- Experience in using different components of Spark like Spark Streaming to process real-time data as well as historical data.
- Knowledge on Cloud technologies like AWS Cloud and Amazon Elastic Map Reduce (EMR).
- Proficient in using OOPS Concepts and Java concepts such as Generics, Multi-threading and Collections necessary for writing Map Reduce Jobs.
- Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, SQL Server and MySQL.
- Experience in Enterprise Service Bus(ESB) such as WebSphere Message Broker.
- Experience with code development frameworks - GitHub, BitBucket.
- Expertise with Application servers and web servers like WebLogic, IBM WebSphere, Apache Tomcat, and VMware.
- Deployed and monitored scalable infrastructure on cloud environment Amazon web services (AWS)
- Comprehensive knowledge of Software Development Life Cycle (SDLC), having thorough understanding of various phases like Requirement, Analysis, Design, Development and Testing.
- Involved in the Software Development methodologies like Agile and Waterfall estimating the timelines for projects.
- Ability to quickly master new concepts and applications.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop (HDFS & MapReduce), PIG, HIVE, HBASE, ZOOKEEPER, Sqoop, Flume, Kafka, Spark, Spark Streaming, Spark SQL and Data Frames, Graph X, Scala, Elastic Search and AWS
Programming & Scripting Languages: Java, C, SQL, Python, Impala, Scala, C++, ESQL, PHP
J2EE Technologies: JSP, SERVLETS, EJB, Angular JS
Web Technologies: HTML, JavaScript
Frameworks: Spring 3.5 - Spring MVC, Spring ORM, Spring Security, Spring ROO, Hibernate, Struts
Application Servers: IBM Web Sphere, JBoss, WebLogic
Web Servers: Apache Tomcat
Databases: MS SQL Server & SQL Server Integration Services (SSIS), My SQL, MongoDB, Cassandra, Oracle DB, Teradata
IDEs: Eclipse, Net Beans
Operating System: Unix, Windows, Ubuntu, Cent OS
Others: Putty, WinSCP, DataLake, Talend, Tableau, GitHub.
PROFESSIONAL EXPERIENCE:
Hadoop/Spark Developer
Confidential, Cumming, GA
Responsibilities:
- Importing and Exporting huge chunks of data between HDFS and RDBMS by making use of Sqoop.
- Performing Extract, Transform and Load (ETL) processes using Hive
- Importing data stored in Amazon Web Services into HDFS
- Providing solutions to ad hoc client requests for data and experienced in creating ad hoc reports.
- Creating Hive Tables, loading with data and writing Hive queries.
- Implemented dynamic Partitions and Bucketing in Hive for efficient data access.
- Implementing several workflows using Apache Oozie framework to automate day-to-day Sqoop tasks.
- Writing Hive jobs on processed data to parse and structure logs, manage and query data using HiveQL to facilitate effective querying.
- Used Zookeeper to co-ordinate and run different cluster services.
- Making use of Apache Impala wherever possible in place of Hive while analyzing data to achieve faster results.
- Involved in converting Hive queries into spark transformations using Spark RDDs in Scala.
- Experienced with Spark Context, Spark-SQL, Data Frame, RDDs’ and YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed and Configured Kafka brokers to pipeline server logs data into spark streaming for real time processing.
- Analyzing large datasets to find patterns and insights within structured and unstructured data to help business intelligence with the help of Tableau.
- Coordinating with teams to resolve any errors and problems that arise technically as well as functionally.
Environment: Hadoop, Cloudera, Pig, Hive, Sqoop, Flume, Kafka, Spark, Storm, Tableau, HBase, Scala, Kerberos, Agile, Zookeeper, Maven, AWS, MySQL.
Hadoop Developer
Confidential, Frisco, TX
Responsibilities:
- Design, implementation and deployment of Hadoop cluster.
- Providing solutions based on issues using big data analytics.
- Part of the team that built scalable distributed data solutions using Hadoop cluster environment using Horton Works distribution.
- Loading data into the Hadoop distributed file system (HDFS) with the help of Kafka and REST API
- Worked on Sqoop to load data into HDFS from Relational Database Management Systems.
- Implementation of Talend jobs to load and integrate data from excel sheets using Kafka.
- Developed custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data as per the requirement.
- Worked on ORC, Avro file formats and some compression techniques like LZO.
- Used Hive on top of structured data to implement dynamic partitions and bucketing of Hive tables.
- Carried out transforming huge data of Structured, Semi-Structured and Unstructured types and analyzing them using Hive queries and Pig scripts.
- Experienced in using Hadoop YARN as execution engine for data analytics using Hive.
- Worked with MongoDB for developing and implementing programs in Hadoop Environment.
- Based on necessity, used job management scheduler Apache Oozie to execute the workflow.
- Implemented Ambari to keep track of node status, job progression and running analytical jobs in Hadoop clusters
- Expertise in Tableau to build customized graphical reports, charts and worksheets.
- Filter the dataset with PIG UDF, PIG scripts in HDFS and Storm/Bolt in Apache Storm.
Environment: Hadoop, Pig, Hive, HBase, Sqoop, Python, Oozie, Zookeeper, RHEL, Java, Eclipse, SQL, NoSQL, Talend, Tableau, MongoDB.
Hadoop Developer
Confidential, Portland, ME
Responsibilities:
- Used Hadoop Cloudera Distribution. Involved in all phases of the Big Data Implementation including requirement analysis, design and development of Hadoop cluster.
- Collecting and aggregating huge sets of data using Apache Flume, staging data in Hadoop storage system HDFS for further analyzation.
- Design, build and support pipelines of data ingestion, transformation, conversion and validation.
- Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Performing different types joins on hive tables along with experience in partitioning, bucketing and collection concepts in Hive for efficient data access.
- Managing data between different databases like ingesting data into Cassandra and consuming the ingested data to Hadoop.
- Creating Hive external tables to perform Extract, Transform and Load (ETL) operations on data that is generated on a daily basis.
- Creating HBase tables for random queries as requested by the business intelligence and other teams.
- Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
- Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
- Worked on NoSQL databases including HBase and Cassandra.
- Participated in development/implementation of Cloudera impala Hadoop environment.
- Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
- Developed the data model to manage the summarized data.
Environment: Hadoop, Cloudera Hive, Java, Python, Parquet, Oozie, Cassandra, Zookeeper, HiveQl/SQL, MongoDB, Tableau, Impala.
Network Engineer
Confidential
Responsibilities:
- Establishing networking environment by designing system configuration and directing system installation.
- Enforcing system standards and defining protocols.
- Maximizing network performance by monitoring and troubleshooting network problems and outages.
- Setting up policies for data security and network optimization
- Maintaining the clusters needed for data processing especially for big data where a bunch of servers are setup on a complex yet efficient network.
- Reporting network operational status by gathering, filtering and prioritizing information necessary for the optimal network upkeep.
- Keeping the budget low by efficiently making use of the available resources and tracking the data transfer speeds and processing speeds
- Tracking Budget Expenses, Project Management, Problem Solving, LAN Knowledge, Proxy Servers, Networking Knowledge, Network Design and Implementation, Network Troubleshooting, Network Hardware Configuration, Network Performance Tuning, People Management
Environment: Wireshark, GNS3, Hadoop, IP addressing, VPN, VLAN, Network Protocols.
Jr. Java Developer
Confidential
Responsibilities:
- Analyzing requirements and specifications in Agile based environment.
- Development of web interface for User module and Admin module using JSP, HTML, XML, CSS, JavaScript, AJAX, and Action Servlets with Struts Framework.
- Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing the data from GUI Layer to Business Layer).
- Analyzation and design of the system based on OOAD principle.
- Used WebSphere Application Server to deploy the build.
- Development, Testing and Debugging of the developed application in Eclipse.
- Used DOM Parser to parse the XML files.
- Log4j framework has been used for logging debug, info & error data.
- Used Oracle 10g Database for data persistence.
- Transferring of files from local system to other systems is done using WinSCP.
- Performed Test Driven Development (TDD) using JUnit.
Environment: J2EE, HTML, JSON, JavaScript, CSS, Struts, Spring, Hibernate, Eclipse, Oracle 10g, SQL, XML, CVS, HTML.