- A dynamic professional with around 9 years of diversified experience in the field of Information Technology with an emphasis on Big Data/Hadoop Eco System, SQL/NO - SQL databases, Java /J2EE technologies and tools using industry accepted methodologies and procedures.
- Hadoop Developer: Extensively worked on Hadoop tools which include Pig, Hive, Oozie, Sqoop, Spark, Data frames, HBase and MapReduce programming. Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Developed SPARK applications using Scala for easy Hadoop transitions. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive .Developed Spark code and SparkSQL/Streaming for faster testing and processing of data.
- Hadoop Distributions: Worked with Apache Hadoop along enterprise version of Cloudera and Hortonworks. Good Knowledge on MAPR distribution.
- Data Ingestion in to Hadoop (HDFS): Ingested data into Hadoop from various data sources like Oracle, MySQL using Sqoop tool. Created Sqoop job with incremental load to populate Hive External tables. Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
- File Formats: Involved in running Hadoop streaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquette.
- Scripting and Reporting: Created scripts for performing data-analysis with PIG, HIVE and IMPALA. Used the ANT script for creating and deploying .jar, .ear and .war files. Generated reports, extracts and statistics on the distributed data on Hadoop cluster. Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
- Custom Coding: Written custom UDFs (User Defined Functions) in java for Hive and Pig to extend the functionality. Used Hcatalog for simple query execution. Composed code and created the JAR files unavailable in PIG and Hive. Used automation tool in Maven while composing and creating the JAR files for custom tasks.
- Java Experience: Created applications in core Java, built application that satisfy use of database and constant connectivity such as a client-server model using JDBC, JSP, Spring and Hibernate. Implemented web-services for network related applications in java.
- Methodologies: Handful experience in working with different software methodologies like Water fall and agile methodologies.
Technologies ToolsLanguages/Tools: Java, XML, XSTL, HTML/XHTML, HDML, DHTML, Python, Scala, R, GIT.
Big Data Technologies: Apache Hadoop,HDFS, Spark, HIVE, PIG,Talend,HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Splunk, Flink, Solr, Kafka, Storm, Cassandra, Impala, HUE, NIFI, Tez, Green plum, MongoDB, Scala.
Java Technologies: JSE JAVA architecture, OOPs concepts, JEE JDBC, JNDI, JSF(Java Server Faces), Spring, Hibernate, SOAP/Rest web services
Web Technologies: HTML, XML, Java Script, WSDL, Soap, JSON, angular JS
Databases/: NO SQL MS SQL Server, MySQL, HBase, Oracle, MS Access, Teradata, oracle, Netezza.
Confidential, Charlotte, NC
- Created Sqoop job to bring the data from DB2 to HDFS and created external hive tables in hive.
- Wrote FTP script to copy CSV files from windows share drive to AWS S3.
- Written Oozie workflow to run the Sqoop and HQL scripts in Amazon EMR.
- Written a shell script to do the row count for all tables in hive database on Ingestion date and appended the results to logs.
- Created partitions in Hive.
- Created multiple dashboards in Splunk for 24 hours, 80 hours and 4 years which helps business users to retrieve and analyze the data.
- Created Alerts in Splunk which an Alert Mail will send to the team members if the Sqoop ingestion is varies more than 50 percentage less or more. created new variables in Splunk and assigned new regular expressions to those created variables to differentiate with existing.
- Used Bitbucket as a code repository and worked on creating branches and build using Bamboo.
- Created and automated control-m jobs for daily ingestion process.
- Worked on monitoring jobs in control-m for daily jobs and debugged the issues that caused to fail the scheduled control-m jobs.
- Worked as a support for both prod and non-prod.
- Automated python script in AWS Lambda function to copy files from one S3 Bucket to other S3 Bucket.
- Created a spark SQL for joining three hive tables and write them to a hive table and stored them on to S3.
- Worked on Spark SQL for joining multi hive tables and write them to a final hive table and stored them on S3.
- Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
- Created Spark jobs to do lighting speed analytics over the spark cluster.
- Use Data frames for data transformations.
- Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
- Collect the data using Spark streaming and dump into HBase.
- Fetch and generate monthly reports. Visualization of those reports using Tableau.
Environment: Hadoop, Hive, Linux, Map Reduce, Sqoop, FTP, DB2, Cognos, Spark, shell Scripting, agile methodology, AWS, Splunk, Oozie, Kafka, Bitbucket, HBASE, Bamboo.Confidential, Plano, TX
- Used Cloudera distribution for Hadoop ecosystem.
- Analyzed Hadoop cluster and different big data analytic tools including Map Reduce, Pig, Hive and Spark.
- Created Sqoop jobs to import the data from Oracle to HDFS.
- Exported data using Sqoop into HDFS and Hive for report analysis.
- Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
- Developed Pig Latin scripts to load data from output files and put to HDFS.
- Used Oozie Workflow engine to run multiple Hive and Pig jobs.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Implemented various machine learning techniques like Random Forest, K-Means, Logistic Regression for predictions and pattern identification using Spark-MLib.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Wrote Hive queries for data analysis to meet the business requirements.
- Developed Kafka producer and consumers for message handling.
- Used storm for an automatic mechanism to analyze large amounts of non-unique data points with low latency and high throughput.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Created Lambda function in AWS to automate to copy objects from one S3 bucket to anther S3 bucket.
- Created an IAM role to allow copy objects from S3 to S3.
- Fetch and generate monthly reports, Visualization of those reports using Tableau.
- Used Cloudera distribution for hadoop ecosystem.
- Analyzed Hadoop cluster and different big data analytic tools including Map Reduce, Pig, Hive and Spark.
- Created Sqoop jobs for importing the data from Relational Database systems into HDFS.
- Automated and scheduled the Sqoop jobs in a timely manner using Unix Shell Scripts.
- Extensively used Pig for data cleansing.
- Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs using .
- Written python scripts to analyze the data of the customer.
- Created partitioned tables in Hive.
- Designed a data warehouse using Hive.
- Developed Hive queries for the analysts.
- Developed Shell scripts to automate the test scripts through Impala.
- Written the SQL stored procedure in Hue to access the data from Impala.
- Wrote tables to Impala for faster retrieval using different file formats.
- Applied Performance tuning queries in Impala for faster retrieval.
- Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
- Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
- Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and Data frames API to load structured data into Spark clusters
- Designed and Developed Scala workflows for data pull from cloud based systems and applying transformations on it.
- Tuned HBase and MySQL for optimizing the data.
- Developed Tableau visualizations and dashboards using Tableau Desktop
- Worked with BI (Business Intelligence) teams in generating the reports and designing ETL workflows on Tableau. Deployed data from various sources into HDFS and building reports using Tableau.
- Extensively in creating Map-Reduce jobs to power data for search and aggregation.
- Deployed Cloudera Hadoop Cluster on AWS for Big Data Analytics.
- Utilized GIT for code versioning while following a Gitfllow workflow.
- Involved in loading data from Linux file systems, servers, java web services using Kafka producers and consumers.
- Involved with various levels of individuals to coordinate and prioritize multiple projects. Estimate scope, schedule and track projects throughout SDLC.
Environment: Hadoop, Hive, Impala, Linux, Map Reduce, Sqoop, Kafka, Spark, HBase, shell Scripting, Eclipse, Maven, Java, agile methodologies, AWS, Talend, Splunk, Tableau, Oozie.Confidential, Columbus, GA
- Used Hortonworks distribution for hadoop ecosystem.
- Created Sqoop jobs in Oozie workflow.
- Monitored multiple Hadoop clusters environments using Ganglia.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Used Celery as task queue and RabbitMQ, Redis as messaging broker to execute asynchronous tasks.
- Created HIVE databases and granted appropriate permissions through Ranger policies.
- Evaluated Hortonworks NiFi (HDF 2.0) and recommended solution to inject data from multiple data sources to HDFS & Hive using NiFi.
- Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Worked with Developer teams to move data in to HDFS through HDF NiFi.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Involved in designing of MapReduce jobs with Greenplum Hadoop system (HDFS).
- Worked with MPP for extracting or loading data as per needs.
- Responsible for developing efficient MapReduce on AWS cloud programs for more than 4 years' worth of claim data to detect and separate fraudulent claims.
- Implemented monitoring and established best practices around usage of elasticsearch
- Monitoring local file system disk space usage, CPU using Ambari.
- Written python scripts to update content in the database and manipulate files.
- Developed a scalable, cost effective, and fault tolerant data ware house system on Amazon EC2 Cloud.
- Developed Data Warehousing and ETL processes.
- Performed importing data from various sources to the Cassandra cluster using Java APIs.
Environment: Hadoop 1x, HDFS, Map Reduce, Hive 10.0, Pig, Sqoop, Ganglia, Cassandra, Shell Scripting, AWS, MySQL, HortonWorks, Ubuntu 13.04.Confidential, San Francisco, CA
- Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
- Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
- Developed in scheduling Airflow workflow engine to run multiple Hive and pig jobs using python.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Developed jobs in Talend Enterprise edition from stage to source, intermediate, conversion and target.
- Developed several reports using kibana via elasticsearch.
- Added business logic to parse data from a multi-tenant SQL Server databases into single tenant Teredata Aster databases
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Worked on JVM performance tuning to improve Map-Reduce jobs performance
Environment: Hadoop, MapReduce, HDFS, Hive, DynamoDB, Oracle 11g, Java, Struts, Servlets, HTML, Airflow, XML, SQL, J2EE, JUnit, teredata,Tomcat 6.,Talend.Confidential, St Louis, Mo
- Developed Maven scripts to build and deploy the application.
- Developed Spring MVC controllers for all the modules.
- Implemented JQuery validator components.
- Extracted data from Oracle as one of the source databases.
- Using Data stage ETL tool to copy data from Teradata to Netezza
- Created ETL Data mapping spreadsheets, describing column level transformation details to load data from Teredata Landing zone tables to the tables in Party and Policy subject area of EDW based on SAS Insurance model.
- Used JSON and XML documents with Marklogic NoSQL Database extensively. REST API calls are made using NodeJS and Java API.
- Built data transformation with SSIS including importing data from files.
- Loaded the flat files data using Informatica to the staging area.
- Created SHELL SCRIPTS for generic use.
Environment: Java, Spring, MPP, Windows XP/NT, Informatica Power center 9.1/8.6, UNIX, Teradata, Oracle Designer, Autosys, Shell, Quality Center 10.Confidential
- Involved in the analysis, design, implementation, and testing of the project.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Implemented Spring IoC framework
- Developed Spring REST services for all the modules.
- Developed custom SAML and SOAP integration for healthcare.
- Used DAO and JDBC for database access.
- Built responsive Web pages using Kendo UI mobile.
- Analyzing and preparing the requirement Analysis Document.
- Deploying the Application to the JBOSS Application Server.
- Implemented Web Service using SOAP protocol using Apache Axis.
- Requirement gatherings from various parties involved in the project
- Used to J2EE and EJB to handle the business flow and Functionality.
- Involved in the complete SDLC of the Development with full system dependency.
- Actively coordinated with deployment manager for application production launch.
- Monitoring of test cases to verify actual results against expected results.
- Carrying out Regression testing to track the problem tracking.
Environment: Java, J2EE, EJB, UNIX, XML, Work Flow, JMS, JIRA, Oracle, JBOSS, Soap.