- 7+ years of IT experience which includes 4+ years of experience as Hadoop/Spark developer using Big data technologies like Hadoop Ecosystem, Spark Ecosystem and 3 years of Database administration and application development using J2EE.
- Experience as Hadoop Developer with good knowledge in SPARK, SCALA, YARN, PIG, HIVE, SQOOP, IMPALA, KAFKA and HBASE .
- Good working knowledge of Amazon Web Service components like EC2, EMR, S3, IAM.
- Experience in working with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Hands on experience in various Big data application phases like Data Ingestion, Data Analytics & data visualization.
- Extending Hive and Pig core functionality by writing custom User Defined Functions.
- Knowledge in job/workflow scheduling and monitoring tools like Oozie & Zookeeper.
- Experience in analyzing data using HIVEQL, PIG Latin.
- Hands on experience in application development using Java, SCALA, SPARK and Linux shell scripting.
- Extracted data from ORACLE 11g to HDFS using Sqoop.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation.
- Familiar with Kafka, Spark with YARN Local and Standalone modes.
- Documented and explained implemented processes and configurations in upgrades.
- Expertise in creating Hive Internal/External tables, Hive's analytical functions, views and writing scripts in HQL.
- Experience in using Accumulator variables, Broadcast variables and RDD caching for spark streaming.
- Having good knowledge on Apace Spark components including SPARK SQL and SPARK Streaming.
- Experience in using Sequence files, ORC, AVRO and Parquet file formats.
- Experienced in implementing scheduler using Oozie, Crontab and Shell scripts
- Experience with SVN and GIT for code management and version control in collaborated projects.
- Installed and configured Horton Works Hadoop cluster on 10 nodes in Test environment using Amazon EC2 and EBS storage volumes.
- Possess strong knowledge on Oracle Database Administration.
- Experience on installation and maintenance of oracle RAC databases in production.
- Experience in tuning SQL queries to optimize performance.
- Performed automation of the DBA tasks in order to reduce the day to day burden of the DBA monitoring activities and to get rid of the human errors.
- Used various Project Management services like BMC, JIRA for handling service requests and tracking issues.
Big Data ECO System: HDFS, YARN, Spark, Hive, Pig, Kafka, Oozie, ZooKeeper, Sqoop, Impala, Spark Streaming, Spark SQL
Cloud Technologies: AWS (Amazon Web Services) EC2, S3, IAM, CLOUD WATCH, DynamoDB, SNS, SQS, EMR, KINESIS
Database Tools: OEM, RMAN, Oracle Netca, Data pump, DBCA, Data Guard
Databases: Oracle 11g, Hbase, MONGO DB, MySQL
Big Data Distribution: Hortonworks, Cloudera
JAVA/J2EE Technologies: XML, Junit, JDBC, AJAX, JSON, JSP
Operating Systems: Linux, Windows, Kali Linux
IDE/Build Tools: Eclipse, Intellij, Sublime, Maven, SBT, Ant
Data Visualisation: Tableau
Version Control: Git, SVN
SDLC: Agile/SCRUM, Waterfall
Big Data Developer
- Detailed Understanding on existing build system, Tools related for information of various products and releases and test results.
- Developed Spark programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Used Spark - SQL to load JSON data, created schema RDD’s and loaded into Hive tables and handled structured data using Spark-SQL.
- Converted long running MapReduce jobs and also Hive/SQL queries into Spark transformations using Spark RDD’s and Scala.
- Involved in optimizing spark applications for fixing right batch interval time and memory tuning.
- Created Hive external tables, loading data and writing Hive queries to analyse data which will run in a MapReduce way.
- Used JSON and XML Serde’s for Serialization and de-Serialization to load JSON and XML data into Hive tables.
- Experience in using different HDFS file formats like Avro, RC, ORC, Parquet, Sequence files and compression formats like Codec and Snappy.
- Co-ordinated pig and Hive scripts using Oozie workflow.
- Developed Spark jobs using Scala and Spark-SQL/Spark-Streaming for testing data samples and faster processing.
- Implemented Sqoop jobs from Oracle to Hadoop in parquet format.
- Created Hbase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, RDBMS and a variety of portfolios.
- Assisted with data capacity planning and node forecasting.
- Tested raw data and executed performance scripts.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: HDFS, Spark, MapReduce, Hive, Sqoop, Hbase, Oozie, Scala, Java, JSON, SQL, Linux, shell scripting, Cloudera.
Confidential, San Jose, CA
- Possess in depth knowledge of Hadoop Architecture and various components such as HDFS, Application Master, Node Manager, Resource Manager, Name Node, Data Node and Map Reduce concepts.
- Collaborated with Architects to design Spark model for the existing MapReduce model and migrate them to Spark models using Scala.
- Imported required tables from RDBMS to HDFS using Sqoop and also used Kafka to get real time streaming data into Hbase.
- Experience in analysing/manipulating large datasets and finding patterns & insights within structured and unstructured data.
- Developed Hive queries and UDF’s to transform data into HDFS post analysis.
- Experienced in creating Hbase tables to load large sets of unstructured data which comes from various media sources.
- Worked on Hive partition and bucketing concepts and created Hive external tables with Hive partitions.
- Developed pig Latin scripts to extract the data from web server output files to load into HDFS.
- Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD’s.
- Hands-on experience with Amazon EC2, S3, EMR, IAM, CLOUDWATCH for the computing and storage of data.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Using pig to do transformations and pre-aggregations before storing data onto HDFS.
- Experienced in migrating HQL into Impala to minimize query response time.
- Debugging and identifying issues reported by QA with the Hadoop jobs by configuring to local file system.
- Experienced in managing and reviewing Hadoop log files.
- Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
Environment: Cloudera Manager, RHEL-6, MapReduce, Spark, Scala, Pig, Hive, Kafka, Sqoop, HDFS, AWS, Impala, YARN, putty
- Responsible for building scalable distributed data solutions using Hadoop.
- Created a data lake which will embrace the existing history data from OLAP databases and also to suffice the need to process the transactional data and co-ordinated with the data modellers to create Hive tables which will replicate the current warehouse table structure.
- Migration of ETL processes from oracle to Hive to test the faster and easy data manipulation.
- Performed Data transformations in Pig & HIVE and used partitions, buckets for performance improvements.
- Installed and configured Horton Works Hadoop cluster on 10 nodes in Test environment using Amazon EC2 and EBS storage volumes for testing POC.
- Developed MapReduce programs on Healthcare domain data to generate faster reports which were running slow in OBIEE dashboards due to rapid data growth.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop .
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyse the logs to identify issues and behavioural patterns.
- Worked with support teams and resolved operational & performance issues.
- Solved performance issues in Hive and Pig with understanding of Joins, Group and aggregation and transfer to Map Reduce.
- Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
Environment: HDFS, MapReduce, Hive, Sqoop, Pig, Java, MapReduce, Hortonworks, Oracle 11g, Oozie, SQL, Centos
- Responsible for Maintaining RAC Databases and single instance databases in production and non-production environment.
- Responsible for Oracle Installations and Oracle Database Upgrades.
- Managed production, development and Test databases in 24*7 environments.
- Monitoring the alert log files on timely basis.
- Generating AWR, ADDM & ASH reports for analyzing Database performance issues.
- Exporting and Importing database backups using Exp & Imp and EXPDP/IMPDP utilities.
- Created and monitored the table spaces, allocated table spaces for users, configuring archive log mode for the database.
- Created and managed database objects like tables, views, and indexes, Managing the physical and logical objects of the database, Monitor physical and logical backup strategies, Managing redo logs, checking alert log and trace files for errors.
- Created users with restricted access and privileges, groups, roles, profiles and assigned users to groups and granted privileges and permissions to appropriate groups.
- Experience in Installation, Configuration & Maintenance of OEM Grid control.
- Experience in Migrating databases from one platform to other platform Oracle Data Migration Assistant.
- Experienced in maintaining Oracle Active Data Guard.
- Experienced in cloning the databases through Hot/Cold and RMAN utility.
- Responsible for taking Hot/Cold and RMAN Backups and in recovering the Database when required.
- Knowledge on Oracle Database performance-tuning services with EXPLAIN PLAN, TKPROF, AUTO TRACE, AWR REPORTS.
- Reorganization of tables, fragmented table spaces and database using export and import tools and frequent rebuilding of indexes.
- Gathering the database statistics to improve performance.
- Experience in Applying patches using OPatch utility.
- Interacted with clients on daily basis for requirement gathering, clarifications, resolutions and status updates.
Environment: Oracle 11gR2, RAC, RHEL-5, Putty, SQL Developer, OEM, ETL, Informatica, DAC, OBIEE, SIEBEL CRM
- Responsible in gathering requirements and designing use cases, technical design and implementation.
- Implemented core framework components for executing workflows using Core Java, JDBC, Servlets and JSPs .
- Responsible for designing Front end system using HTML, CSS, JSP, Servlets and Ajax .
- Transformed web application into compatible Mobile & Tablet application by designing responsive designs using HTML & CSS.
- Used LDAP for user Authentication and authorization.
- Created Stored Procedures, Views, Cursors and functions to support application.
- Used SVN as a repository for managing/deploying application code.
- Used FTP services to retrieve Flat Files from the external sources.
- Involved in the system integration and user acceptance testing using Junit.
- Worked with Business Analyst and Architects to develop applications based on project requirements.
- Collaborated with developers and performance engineers to enhance supportability and identify technical glitches.