- 8 years of overall IT experience in a variety of industries, which includes hands on experience of 3+ years in Big Data technologies and extensive experience of 4+ years in Java.
- Good understanding of ClassicHadoop and Yarn architecture along with variousHadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, Resource Manager, Node Manager, Application Master and Containers.
- Hands on experience in working with Ecosystems like Hive, Pig, Sqoop, Map Reduce, Flume, Oozie.,Zookeeper, HDFS, HBase, Spark.
- Strong knowledge of Pig and Hive’s analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
- Experience in importing and exporting terra bytes of data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Developed multi tab reports and dashboards using Tibco SpotFire 5.5 Suite and publish them on Web Player
- Involvement in all phases of SDLC from project proposal, planning, analysis, development, testing, deployment and support.
- Good Experience with databases, writing complex queries and stored procedures using SQL and PL/SQL.
- Experience in developing and implementing web applications using Java, JSP, CSS, HTML, HTML5, XHTML and Java script, JSON, XML, JDBC.
- Good experience on general data analytics on distributed computing cluster like Hadoop using Apache Spark, Impala, and Scala
- Highly skilled in integrating Kafka with Spark streaming for high speed data processing.Have knowledge on Python and shell scripting.
- Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups.
- Created reports for the users using Tableau Desktop by connecting to multiple data sources like Flat files, MS Excel, CSV files, SQL server and Oracle.
- Detailed knowledge and experience of Design, Development and Testing Software solutions using Java and J2EE technologies.
- Good hands on experience in creating the RDD's, Data frames for the required input data and performed the data transformations using Spark Scala.
- Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
Hadoop Components: HDFS, Hue, MapReduce, PIG, Hive, HCatalog, Hbase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos.
Spark Components: Apache Spark,Data Frames, SparkSQL, Spark, YARN, Pair RDDs
Scripting: UNIX Shell Scripting.
Databases: Oracle 10g, Microsoft SQL Server, MySQL, DB2, Teradata
Programming Languages: Java, C, C++, Scala, Impala,Python.
Web Servers: Apache Tomcat, BEA WebLogic.
IDE: Eclipse, Dreamweaver
OS/Platforms: Windows 2005/2008, Linux (All major distributions), Unix.
NoSQL Databases: Hbase, MongoDB.
Methodologies: Agile (Scrum), Waterfall, UML, Design Patterns, SDLC.
Currently Exploring: Apache Flink, Drill, Tachyon.
Confidential, Dallas, TX
Hadoop / Spark Developer
- Responsible for building scalable distributed data solutions usingHadoop.
- Job duties include design and development of various modules inHadoopBig Data platform and processing data using MapReduce, Hive, SQOOP, Pig and Oozie.
- Developed job processing scripts using Oozie workflow
- Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
- Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
- Worked with Apache Hadoop, Spark and Scala.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Worked with TIBCO Spotfire Statistical Services
- Developed Java Mapper and Reducer programs for complex business requirements.
- Developed Java custom record reader, partitioner and serialization techniques.
- Extensively worked on Spark Data frames, Spark Data sources, Spark SQL and Streaming using scala.
- Created Tableau Data extracts for improving query performance and enable efficient in-memory data access on Tableau Data Engine.
- Wrote Map Reduce jobs to discover trends in data usage by users.
- Uses Talend Open Studio to load files intoHadoopHIVE tables and performed ELT aggregations in HadoopHIVE.
- Worked on upgrade and configuration of Pivotal Cloud Foundry from Version 1.7 to Version 1.8, Hive, and Java/Python MapReduce applications for analytics and machine learning at scale.
- Responsible for different Software Development Life Cycle (SDLC) processes included Analysis, Design, Code, Test and Document.
- Created GUI in ASP.Net using .NET controls, C# and writing of common controls (*.ascx).
- Used Windows Presentation Foundation for Web for UI enhancement.
- Coding in C#, ASP.NET, HTML and client side validations using Java script.
- Retrieved data from SQL Server database and placed inside the Repeater.
- Responsible for writing SQL Queries based on complex business logic.
- Involved in complete Software Development Lifecycle.
- Designing and creating ETL jobs through Talend to load huge volumes of data into Cassandra, HadoopEcosystem and relational databases.
- Worked extensively with Sqoop for importing metadata from Oracle.Used Sqoop to import data from SQL server to Cassandra.
- Performed event and time based scheduling of Tibco SpotFire Reports and Dashboards
- Experienced in implementing Spark RDD transformations, actions to implement business analysis and Worked with Spark accumulators and broadcast variables.
- Optimized reports and Tibco SpotFire data files for performance
- Designed, developed and did maintenance of data integration programs in aHadoopand RDBMS environment with both traditional and non-traditional source systems as we as RDBMS and NoSQL data stores for data access and analysis. Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Wrote Hive Queries and UDF's.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
Environment: MapReduce, Spark, HDFS, Pig, HBase, Oozie, Kafka, Zookeeper, Sqoop, Cassandra, Linux, XML,C#, ASP.NET, ADO.NET, Web Services, Hadoop, Toad, Tableau, Maven, NoSQL, MySQL,Hive, Java, Java script, Eclipse, Oracle 10g,ETL,Python.
- Developed Sqoop scripts to unload data from SQL server
- Built Oozie scripts to execute sqoop scripts.
- Scheduled ETL process in Autosys.
- Fix the queries that fail due to conversion from Oracle to Hadoop.
- Fix data issues caused due to change in environments.
- Co-ordination with Release management & Change control team on the implementation.
- Working with Marketing team on data validation
- Testing in SIT & UAT to install the code in production
- Parallel run of the scripts with Oracle process to validate data.
Environment: HDFS, Hive, Sqoop, Flume, Oozie, CDH 4.x, Python, Linux, Autosys, Python & Java, Oracle.
Confidential, Bloomington, IL
- Worked on analyzingHadoopstack and different big data analytic tools including Pig and Hive, HBase database and Sqoop.
- Designed high level ETL architecture for overall data transfer from the OLTP to OLAP.
- Installed and configured Pig for ETL jobs Designed high level ETL architecture for overall data transfer from the OLTP to OLAP.
- Wrote MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi-structured and unstructured data.
- Developed PL/SQL procedures for processing business logic in the database.
- Imported data using Sqoop from Teradata using Teradata connector.
- Worked on a POC on Spark and Scala parallel processing.
- Real streaming the data using Spark with Kafka.
- Experience with Core Distributed computing and Data Mining Library using Apache Spark.
- Used Hive to process data and Batch data filtering .Used Spark for any other value centric data filtering.
- Developed new Tableau dashboards for Contracts with multiple connections.
- Assigned name to each of the columns using case class option in Scala.
- Wrote complex Hive queries and UDFs in Java and Python.
- Monitored and identified performance bottlenecks in ETL code. Worked on data utilizing aHadoop, Zookeeper, and Accumulo stack, aiding in the development of specialized indexes for performant queries on big data implementations.
- Used Zookeeper for various types of centralized configurations, SVN for version control, Maven for project management, Jira for internal bug/defect management, MapReduce.
- Installed the Operating System on Solaris and Linux servers and Blades over the network.
- Got good experience with NoSQL database.
- Hands on experience publishing of various kinds of interactive data visualizations, dashboards, and workbooks from Tableau Desktop to Tableau Servers, Web pages.
- Worked on MongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design.
- Configuring high availability using geographical MongoDB replica sets across multiple data centers.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
- Designed and implemented MapReduce-based large-scale parallel relation-learning system, Installed and benchmarkedHadoop/HBase clusters for internal use.
Environment: Hadoop, Spark, HDFS, Hive, Pig, HBase, Oozie, Sqoop, Kafka, Zookeeper, MongoD, MapReduce, Cassandra, Linux, XML, Toad, Maven, NoSQL, MySQL Workbench, Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS.