- Around 7 plus years of development experience using Hadoop,Java and Oracle, which includes Big Data ecosystem,design, development and administration.
- Over 3 years of extensive experience in Big Data and excellent understanding/knowledge of Hadoop architecture and various components such as Spark SQL, HDFS, Pig, Hive, Hbase, Sqoop, Flume, Yarn, Zookeeper, Kafka and Cassandra.
- Experience in loading structured, semi - structured and unstructured data from different sources like csv, xml files, Teradata, MS SQL Server, Oracle into Hadoop.
- Experience in importing and exporting the different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.
- Experience in writing Scala programs.
- Exposure on Python programming language.
- Expertise is working with distributed and global project teams.
- Experience in using various Hadoop distributions like Cloudera, Hortonworks.
- Good exposure on Yarn environment withSpark, Kafka and dealing with file formats like Avro, Json, Xml and sequence files.
- Experience writing custom UDFs in pig and hive based on the user requirement.
- Experience in storing, processing unstructured data using NOSQL databases like Hbase, Cassandra and MongoDB.
- Experience in writing work flows and scheduling jobs using Oozie.
- Involved in project planning, setting up standards for implementation and design of Hadoop based applications.
- Experience in Work independently and end to end on projects.
- Proficiency in creating business and technical project documentation.
- Ability to lead Team and develop a project from scratch.
Hadoop/Big Data: Apache Spark, HDFS, Map Reduce, Hive, Pig, Oozie, Flume, ZooKeeper, Scoop, Hbase, Cassandra, Spark Streaming, Kerberos, Zeppelin
NoSQL Databases: HBase,Cassandra, mongoDB
Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts,Perl, Python,Scala, R
Operating Systems: Sun Solaris, UNIX, Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Databases: Oracle, SQL Server, MySQL, Netteza
Tools and IDE: Eclipse, NetBeans, intellij & Maveen, SBT, JDeveloper, DB Visualizer,Toad,SQL Developper.
Version control: SVN, Git, Bit Bucket
Big Data Developer
Environment Spark,SparkSQL, Hadoop-HDFS, Pig, Sqoop, Hive, Oozie, MySQL, Scala, Talend, Autosys
- Involved in complete project life cycle starting from design discussion to production deployment.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
- DevelopedSparkcore andSparkSQL scripts using Scala for faster data processing.
- Extensively used big data analytical and processing tools Hive,SparkCore,SparkSQL for batch processing large data sets on Hadoop cluster.
- Experienced with theSparkimproving the performance and optimization of the existing algorithms in Hadoop usingSparkContext,Spark-SQL, Data Frame, RDD's and YARN.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig
- Import and export data between the environments like MySQL, HDFS and deploying into productions.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations beforestoring the data onto HDFS.
- Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
- Involved in developing Impala scripts to do Adhoc queries.
- Experience in Oozie workflow scheduler template to managevarious jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
- Extensively used SVN as a code repository for managing day agile projectdevelopment process and to keep track of the issues and blockers.
- Developed the Oozie workflows with Sqoop actions to migrate the data from relationaldatabases like Oracle, Netezza, Teradata to HDFS.
- Used Hadoop FS actions to move the data from upstream location to local data lake locations.
- Created a common data lake for the migrated data to be used by other members of the team.
- Written extensive Hive queries to do transformations on the data to be used by downstream models.
- Developed map reduce programs as a part of predictive analytical model development.
- Developed Hive queries to do analysis of the data and to generate the end reports to beused by business users.
Confidential, Atlanta, GA
Environment Spark,SparkSQL, Hadoop-HDFS, Pig, Sqoop, Hive, Flume, Oozie, MySQL, Scala
- Migrating jobs from Sqoop and pig to Spark SQL for faster processing
- Loading data intoSparkRDD and do in memory data Computation to generate the Output response.
- Enhanced and optimized productSparkcode to aggregate, group and run data mining tasks using theSparkframework.
- Loading data intoSparkRDD and do in memory data Computation to generate the Output response.
- DevelopedSparkcode andSpark-SQL for faster testing and processing of data. Experiencing inExtending Hive and Pig core functionality by writing custom UDFs
- Analyzed large data sets by running Hive queries and Pig scripts
- Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume.
- Load and transform large sets of structured, semi structured and unstructured data
- Scheduling and managing cron jobs, wrote shell scripts to generate alerts.
- Prepared design documents and functional documents.
- Based on the requirements, addition of extra nodes to the cluster to make it scalable.
- Involved in running Hadoop jobs for processing Billions of records of text data
- Involved in loading data from local file system (LINUX) to HDFS
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Assisted in exporting analyzed data to relational databases using Sqoop
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing
Confidential, San Francisco, CA
Environment Hadoop, Hive, Impala, Oracle, Spark, Scala, Pig, Netezza, Sqoop, Oozie, Version one, Shell.
- Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory
- Involved in analysis, design, testing phases and responsible for documenting technical specifications.
- Computing capabilities like Apache Spark written in Scala.
- Experience in Hadoop distributed file system Cloudera.
- Developed and executed shell scripts to automate the jobs
- Wrote complex Hive queries and UDFs.
- Worked on reading multiple data formats on HDFS using Spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Developed multiple POCs using Spark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Spark
- Automation Tool (Autosys) for scheduling oozie jobs based on calendar and file watcher jobs.
- Facilitating the dailyscrummeetings, sprint planning, sprint review, and sprint retrospective.
- Worked on the core and Spark SQL modules of Spark extensively.
- Responsible for understanding the scope of the project and requirement gathering.
- Developed the web tier using JSP, Struts MVC to show account details and summary.
- Created and maintained the configuration of the Spring Application Framework.
- Implemented various design patterns - Singleton, Business Delegate, Value Object and Spring DAO.
- Used Spring JDBC to write some DAO classes which interact with the database to access account information.
- Mapped business objects to database using Hibernate.
- Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for Unit Testing.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/ SQL code for procedures and functions.
- Used CVS, Perforce as configuration management tool for code versioning and release.
- Developed application using Eclipse and used build and deploy tool as Maven.
- Used Log4J to print the logging, debugging, warning, info on the server console.
- Involved in the analysis, design, implementation, and testing of the project
- Designed the functional specifications and architecture of the web-based module using Java Technologies.
- Created Design specification using UML Class Diagrams, Sequence & Activity Diagrams
- Developed the Web Application using MVC Architecture, Java, JSP, and Servlets & Oracle Database.
- Developed various Java classes, SQL queries and procedures to retrieve and manipulate the data from RDBMS
- Backend Oracle database using JDBC.
- Extensively worked with Java Script for front-end validations.
- Analysis of business requirements and develop system architecture document for the enhancement project.
- Provided Impact Analysis and Test cases.
- Involved writing the JDBC connectivity code to interact the back end data base.
- Designed tables and indexes.
- Wrote complex SQL queries and stored procedures.
- Designed tables and indexes and involved in writing the DAO interaction layer as per the requirements.
- Designed, Implemented, Tested and Deployed Enterprise application using WebLogic as Application Server.
- Involved in fixing bugs and unit testing with test cases using Junit.
- Actively involved in the system testing.
- Involved in implementing service layer using Spring IOC module.