- Overall 8+ years of overall experience with strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.
- Over 8+ years of comprehensive IT experience in BigData and Big DataAnalytics, Hadoop, HDFS, Map Reduce, YARN, Hadoop Ecosystem and ShellScripting.
- Highly capable for processing large sets of Structured, Semi - structured and Unstructured datasets and supporting BigData applications.
- Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
- Experience in ApacheSpark cluster and streams processing using Spark Streaming
- Expertise in moving large amounts of log, streaming event data and Transactional data using Flume.
- Experience in developing Map Reduce jobs in Java for data cleaning and pre-processing.
- Expertise in writing PigLatin, Hive Scripts and extended their functionality using UserDefined Functions (UDF's).
- Good knowledge on Hadoop, Hbase, Hive, Pig Latin Scripts, MR, Sqoop, Flume, Hive QL.
- Experience in analyzing data using Pig Latin, HiveQL and HBase.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (i.e Teradata, Oracle, MYSQL) to Hadoop.
- Worked on NoSQL databases including HBase, Cassandra and MongoDB
- Successfully loaded files to Hive and HDFS from MongoDB, HBase
- Experience in configuring Hadoop Clusters and HDFS.
- Expertise in handling structured arrangement of data within certain limits (Data Layout's) using Partitions and Bucketing in Hive.
- Expertise in preparing interactive Data Visualization's using Tableau Software from different sources.
- Hands on experience in developing workflows that execute MapReduce, Sqoop, Pig,Hive and Shellscripts using Oozie.
- Experience working with Cloudera HueInterface and Impala.
- Expertise in developing SQLqueries, Stored Procedures and excellent development experience with Agile Methodology.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Excellent leadership, interpersonal, problem solving and time management skills.
- Excellent communication skills both Written (documentation) and Verbal (presentation).
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, and Zookeeper.
No SQL Databases: Hbase, Cassandra, MongoDB
Languages: C, Python, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R Programming
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
- Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
- Developed Spark scripts by using Scala as per the requirement.
- Load the data into SparkRDD and performed in-memory data computation to generate the output response.
- Performed different types of transformations and actions on the RDD to meet the business requirements.
- Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analysing data.
- Also worked on analysing Hadoop cluster and different bigdata analytic tools including Pig, HBase and Sqoop.
- Involved in loading data from UNIX file system to HDFS.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented best offer logic using Pig scripts and Pig UDFs.
- Responsible to manage data coming from various sources.
- Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
- Cluster coordination services through Zookeeper.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
- Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Involved in managing and reviewing Hadoop log files.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Responsible for writing Hive queries for data analysis to meet the business requirements.
- Responsible for creating Hive tables and working on them using HiveQL.
- Responsible for importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Designed and implemented Map Reduce based large-scale parallel relation-learning system.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Importing the unstructured data into the HDFS using Flume.
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Involved in using HBase Java API on Java application.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
- Hands on design and development of an application using Hive (UDF).
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Provide support data analysts in running Pig and Hive queries.
- Involved in HiveQL and Involved in Pig Latin.
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
Environment: Hadoop, MapReduce2.7.2, Hive2.0, Pig0.16, Sqoop2, Java, Oozie, HBase0.98.19, Kafka0.10.1.1, Spark2.0, Scala2.12.0, Eclipse, Linux, Oracle, Teradata.
Confidential, San Francisco, CA
- Worked on Hortonworks-HDP 2.5distribution
- Responsible for building-scalable distribution data solution using Hadoop
- Involved in importing data from MicrosoftSQLServer, MySQL, Teradata. into HDFS using Sqoop.
- Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
- Writing HiveQL queries for integrating different tables for create views to produce result set.
- Collected the log data from Web Servers and integrated into HDFS using Flume.
- Experienced on loading and transforming of large sets of structed and unstructured data.
- Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
- Written MapReduce programs to handle semi structed and un structed data like JSON, Avro data files and sequence files for log files.
- Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
- Involved in loading data into HBaseNoSQL database.
- Building, Managing and scheduling Oozie workflows for end to end job processing
- Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
- Analyzing of Large volumes of structured data using SparkSQL.
- Written shell script to execute HiveQL.
- Used Spark as ETL tool
- Written Automated shell scripts in Linux/Unix environment using bash.
- Migrated HiveQL queries into SparkSQLto improve performance.
- Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
- Experienced in using Data Stax Spark connector which is used to store the data into Cassandra databaseor get the data from Cassandra database.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL.
- Developed a process for Sqooping data from multiple sources like SQLServer, Oracle and Teradata.
- Responsible for creation of mapping document from source fields to destination fields mapping.
- Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Ooziejobs.
- Developed Oozie workflow's for executing Sqoop and Hive actions.
- Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
- Involved in building databaseModel, APIs and Views utilizing python, in order to build an interactive web based solution
- Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
- Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
- Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
- Developed Hivescripts for performing transformation logic and also loading the data from staging zone to final landing zone.
- Developed monitoring and notification tools using Python.
- Worked on Parquet File format to get a better storage and performance for publish tables.
- Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
- Developed Python utility to validate HDFS tables with source tables.
- Designed and developed UDF'S to extend the functionality in both PIG and HIVE.
- Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
- Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector and MySQL dB package to retrieve information.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozieworkflows.
- Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using SparkContext, Spark-SQL, PairRDD's, Spark YARN.
- Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
Environment: Hadoop, HDFS2.6.3, Hive1.0.1, HBase0.98.12.1, Zookeeper3.5.1, Oozie, Impala1.4.1, Java(jdk1.6), ClouderaCDH 3, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume1.6.0, Scala2.11.6, Spark1.5.0, Sqoop1.4.6, Python3.5.1.
- Responsible for understanding the scope of the project and requirements gathering
- Created the database, user, environment, activity and class diagram for the project (UML).
- Implemented the database using oracle database engine.
- Created an entity object (business rules and policy, validation logic, default value logic, security).
- Web application development using J2EE, JSP, Servlets, JDBC, JavaBeans, Struts, Ajax, Custom Tags, EJB, Hibernate, Ant, Junitand ApacheLog4j, Web Services, Message queue(MQ).
- Created applications, connection pools, deployment of JSP & Servlets.
- Used Oracle, MySQL database for storing user information.
- Developed backed for application using PHP for web applications.
- Hands on experience in all phases of SDLC (software development life cycle) involving.
- Used the Eclipse as IDE, configured and deployed the application onto WebLogic application server using Maven build scripts to automate the build and deployment process.
- Developed UML diagrams using Rational Rose
- Created UI for web applications using HTML, CSS.
- Created Desktop applications using J2EE, Swings.
- Developed the process using Waterfall model.
- Created SQL scripts for Oracle database.
Environment: Java, Servlets, JSF, Adf rich client UI framework ADF-BC (BC4J) 11g, Web Services using Oracle SOA, Oracle Web Logic.