- 7 years of professional experience in Requirements Analysis, Design, Development and Implementation of Java, J2EE and Big Data technologies.
- 4+ years of exclusive experience in Big Data technologies andHadoopecosystem components likeSpark, MapReduce, Hive, Pig, YARN, HDFS, Sqoop, Flume, Kafka and NoSQL systems like HBase, Cassandra.
- Strong Knowledge on Architecture of Distributed systems and Parallel processing, In - depth understanding of MapReduce Framework andSparkexecution framework.
- Expertise in writing end to end Data Processing Jobs to analyze data using MapReduce,Sparkand Hive.
- Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
- Experience using variousHadoopDistributions (Cloudera, Hortonworks, Amazon AWS) to fully implement and leverage newHadoopfeatures.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Extensive experience in importing/exporting data from/to RDBMS theHadoopEcosystem using Apache Sqoop.
- Worked on Java HBase API for ingestion processed data to HBase tables
- Strong experience in working with UNIX/LINUX environments, writing shell scripts.
- Good knowledge and experience of Real time streaming technologiesSparkand Kafka.
- Experience in optimization of MapReduce algorithm using Combiners and Practitioners' to deliver the best results.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using design patterns.
- Sound knowledge of J2EE architecture, design patterns, objects modeling using various J2EE technologies and frameworks.
- Adept at creating Unified Modeling Language (UML) diagrams such as Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Rational Rose and Microsoft Visio.
- Experienced in using Agile methodologies including extreme programming, SCRUM and Test Driven Development (TDD).
- Proficient in integrating and configuring the Object-Relation Mapping tool, Hibernate in J2EE applications and other open source frameworks like Struts and Spring.
- Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
- Experience in writing test cases in Java Environment using JUnit.
- Hands on experience in development of logging standards and mechanism based on Log4j.
- Experience in building, deploying and integrating applications with ANT, Maven.
- Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
- Demonstrated technical expertise, organization and client service skills in various projects undertaken.
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Zookeeper, YARN, TEZ, Flume, Spark, Kafka
Java&J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI and Java Beans
Databases: Teradata, Oracle11g/10g, MySQL, DB2, SQL Server, NoSQL (Hbase, MongoDB)
Programming Languages: Java, JQuery, Scala, Python, UNIX Shell Scripting
IDE: Eclipse, NetBeans, pyCharms
Integration & Security: MuleSoft, Oracle IDM & OAM, SAML, EDI, EAI
Build Management tools: Maven, Apache ANT, SOAP, REST
Predictive Modelling Tools: SAS Editor, SAS Enterprise guide, SAS Miner, IBM Cognos.
Scheduling Tools: Cron tab, Autosys, Ctrl M
Visualization Tools: Tableau, Arcadia Data.
Confidential, Plano, TX
- Expertise in designing and deployment ofHadoopcluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, ZooKeeper, SQOOP, flume,Spark, Impala, Cassandra with Hortonworks Distribution.
- InstalledHadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre - processing.
- Assisted in upgrading, configuration and maintenance of variousHadoopinfrastructures like Pig, Hive, and Hbase.
- UsedSparkAPI over HortonworksHadoopYARN to perform analytics on data in Hive.
- Exploring with theSparkimproving the performance and optimization of the existing algorithms inHadoopusingSparkContext,Spark-SQL, Data Frame, Pair RDD's,SparkYARN.
- DevelopedSparkcode using scala andSpark-SQL/Streaming for faster testing and processing of data.
- Import the data from different sources like HDFS/Hbase intoSparkRDD.
- POC on Single Member Debug on Hive/Hbase andSpark.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data intoSparkRDD and do in memory data Computation to generate the Output response.
- Loading Data into Hbase using Bulk Load and Non-bulk load.
- Experience in Oozie and workflow scheduler to managehadoopjobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Expertise in different data Modeling and Data Warehouse design and development.
Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, Kfaka, Solr, HBase, Oozie, Flume,Spark- Streaming/SQL, java, SQL Scripting, Linux Shell Scripting.
Confidential, Phoenix, AZ
- Installed and configuredHadoopEnvironment.
- Developed multiple Map - Reduce jobs in java for data cleaning and preprocessing.
- Installed and configured Pig and also written Pig Latin scripts.
- Used pig and map reduce to analyze XML files and log files.
- Imported data using Sqoop to load data from IBM DB2 to HDFS on regular basis.
- Written Hive queries for data analysis to meet the business requirements.
- Creating Hive tables and working on them using Hive QL.
- Importing and exporting data into HDFS and Hive using Sqoop from IBM DB2, Netezza Databases.
- Used Oozie workflow to co-ordinate pig and hive scripts.
- Used Impala for querying HDFS data to achieve better performance.
- Designed and implemented Map-Reduce based large-scale parallel relation-learning system.
- Setup and benchmarkedHadoop/Hbase clusters for internal use.
- Developed UDF's to pre-process the data and compute various metrics for reporting in both pig and hive.
- Developed Map Reduce program to convert mainframe fixed length data to delimited data.
- Data ingestion from various IBM DB2 tables to HDFS using Sqoop.
- Automated Python scripts to pull and synchronize the code in GitHub environment.
Environment: Hadoop, CDH, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Impala, Hbase, Oracle, Map R AutoSys, Mainframes, JCL, IBM DB2, NDM.
Confidential, Columbus, OH
- Involved in requirement analysis, design, coding and implementation.
- Worked in Agile Methodology and used JIRA for maintain the stories about project.
- Analyzed large data sets by running Hive queries.
- Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
- Handled importing and exporting data into HDFS by developing solutions, analyzed the data using Map Reduce, Hive and produce summary results fromHadoopto downstream systems.
- Used Sqoop to import and export the data fromHadoopDistributed File System (HDFS) to RDBMS.
- Created Hive tables and loaded data from HDFS to Hive tables as per the requirement.
- Established custom Map Reduces programs in order to analyze data and used HQL queries to clean unwanted data.
- Created components like Hive UDFs for missing functionality in Hive to analyze and process the large volumes of data.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Involved in writing complex queries to perform join operations between multiple tables.
- Involved actively verifying and testing data in HDFS and Hive tables while Sqooping data from Hive to RDBMS tables.
- Developing Scripts and Scheduled Autosy's Jobs to filter the data.
- Involved monitoring Auto Sys's file watcher jobs and testing data for each transaction and verified data weather it ran properly or not.
- Created and maintained Technical documentation for launchingHadoopClusters and for executing Hive queries and Pig Scripts
- Used IMPALA to pull the data from Hive tables.
- Used Apache Maven 3.x to build and deploy application to various environments Installed Oozie workflow engine to run multiple Hive jobs which run independently with time and data availabilities
Environment: HDFS,Hadoop, Pig, Hive, Sqoop, Flume, Map Reduce, Oozie, Mongo DB, Java 6/7, Oracle 10g, Sub Version, Toad, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Agile Methodology, JIRA, Auto Sys