- Overall 8+ years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS,HBASE, Map Reduce, HIVE, PIG, FLUME, Mongo DB, OOZIE, SQOOP, and ZOOKEEPER, Java, J2EE, Web Services, XML.
- Expertise with Hadoop Eco Systems including HDFS, Map Reduce, Pig, Hive,Sqoop, Storm, YARN, Zookeeper.
- In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Spark Streaming, Spark MLlib.
- Experience analyzing data using HIVE, Pig Latin, H Base and custom Map Reduce programs in Java.
- Experience in building data pipelines and defining data flow across large systems.
- Deep understanding of data import and export from relational database into Hadoop cluster.
- Experience in handling data load from Flume to HDFS.
- Experience in handling data import from MYSQL solutions like Mongo DB to HDFS.
- Experience in data extraction and transformation using Map Reduce jobs.
- Experience in Big Data Analytics using Cassandra, Map Reduce and relational databases.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, H Base, Zookeeper, Oozier, Hive, Sqoop, Pig, and Flume.
- Experience in analysing data using HiveQL, Pig Latin, H Base and custom Map Reduce programs in Java.
- Experience in extending Hive and Pig core functionality by writing custom UDFs.
- Experience in data management and implementation of Big Data applications using Hadoop frameworks.
- Excellent understanding of job workflow scheduling and locking tools/services like Oozier and Zookeeper.
- Experience in writing Map Reduce jobs using Java and python.
- Extensive knowledge and work Experience in Systems Analysis, Design, Development, Implementation and Testing of Application software for Business solutions, Database Management, Data Analytics.
- Hands-on experience in writing Pig Latin scripts, working with grunt shells and scheduling workflows in Oozier.
- Worked on Classic and Yarn distributions of Hadoop like the Apache Hadoop 2.0.0, ClouderaCDH4 and CDH5.
- Use of IDE for developing environment like Eclipse, Net Beans, Sun ONE Studio, Web Sphere Studio, J-builder, Elixir Case, and Visual Source Safe and Erwin for Data base Scheme Design.
- Sound RDBMS concepts and extensively worked on Oracle, DB2, SQL Server, My SQL, Neteezza using client tools as Toad, Sql developer, win sql etc.
- Experienced in writing PL SQL procedures, Triggers in Oracle and Stored Procedures in DB2 and MySQL.
- Experience in working with the Columnar No SQL Database like H Base, Mongo DB, and Cassandra to manage extremely large data sets.
- Hands on experience in in-memory data processing with Apache Spark.
- Configured and developed complex dashboards and reports on S plunk.
- Knowledge on S plunk architecture and components (indexer, forwarder, search head, deployment server).
- Strong experience on Hadoop distributions Horton works& Cloudera.
- Skilled in Tableau Desktop for data visualization through various charts such as bar charts, line charts, combination charts, pivot table, scatter plots, pie charts and packed bubbles and use multiple measures for comparison such as Individual Axis, Blended Axis, and Dual Axis.
- Published the dashboard reports to Tableau Server for navigating the developed dashboards in web.
- Co-ordinate with business and understand analytics requirements.
- Automated the jobs by pulling data from different sources to load data into HDFS tables using Oozier workflows.
- Interface with SME's, Analytics team Account managers and Domain Architects to review to-be developed solution.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Excellent leadership, interpersonal, problem solving and time management skills.
- Excellent communication skills both Written (documentation) and Verbal (presentation).
Languages/Tools: Java, C, C++, C#, Scala, VB, XML, HTML/XHTML, HDML, DHTML.
Big Data: HDFS, Map Reduce, HIVE, PIG, H Base, SQOOP, Oozier, Zookeeper, Spark, Mahout, Kafka, Storm, Cassandra, Solr, Impala, Green plum, Mongo DB
Web/Distributed Technologies: J2EE, Servlets, JSP, Struts, Hibernate, JSF, JSTL,EJB,RMI,JNI, XML,JAXP,XSL,XSLT, UML, MVC, STRUTS, Spring, Corba, Java Threads.
Browser Languages/Scripting: HTML, XHTML, CSS, XML, XSL, XSD, XSLT, Java script, HTML DOM, DHTML, AJAX.
App/Web Servers: IBM Web sphere … BEA Web logic, J developer, Apache Tomcat, JBoss.
GUI Environment: Swing, AWT, Applets.
Messaging & Web Services Technology: SOAP, WSDL, UDDI, XML, SOA, JAX-RPC, IBM Web Sphere MQ v5.3, JMS.
Testing &Case Tools: J Unit, Log4j, Rational Clear case, CVS, ANT, Maven, J Builder.
Configuration Management: Chef, Puppet, Ansible, Dockers.
Build Tools: CVS, Subversion, GIT, Ant, Maven, Griddle, Hudson, Team City, Jenkins, Chef, Puppet, Ansible, Dockers.
CI Tools: Jenkins, Bamboo.
Scripting Languages: Python, Shell (Bash), Perl, Power Shell, Ruby, Groovy, Power Shell.
Monitoring Tools: Nagios, Cloud Watch, JIIRA, Bugzilla and Remedy.
Databases: NO SQL Oracle, MS SQL Server 2000, DB2, MS Access & My SQL, Teradata. Cassandra, Green plum and Mongo DB
Operating systems: Windows, Solaris, Unix, Linux (Red Hat 'SUSE Linux), Sun Solaris, Ubuntu, Centos.
Confidential, Chicago, IL
- Hadoop tools like Hive, HSQL Pig, H Base, OLAP, Zookeeper, Avro, parquet and Sqoop on Linux ARCH.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Having experience working with DevOps.
- Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
- Having experience in doing structured modelling on unstructured data models.
- Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioural data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Worked on Horton works Data Platform (HDP)
- Worked with SPLUNK to analyse and visualize data.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Responsible for developing data pipeline using Azure HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozier workflow engine to run multiple Hive and Pig Jobs, used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Worked in functional, system, and regression testing activities with agile methodology.
- Worked on Python plug-in on My SQL workbench to upload CSV files.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked with Accumulate to modify server-side Key Value pairs.
- Working experience with Vertical, Qilk Sense, Qlik View and SAP BOE.
- Worked with No SQL databases like H Base, Cassandra, Dynamo DB
- Worked with AWS based data ingestion and transformations.
- Good experience with Python, Pig, Sqoop, Oozier, Hadoop Streaming, Hive and Phoenix.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop like cluster maintenance, adding and removing cluster nodes, Cluster Monitoring, and troubleshooting, Manage and review data backups and log files.
- Developed several new Map Reduce programs to analyse and transform the data to uncover insights into the customer usage patterns.
- Worked extensively with importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive.
- Responsible for running Hadoop streaming jobs to process terabytes of xml's data, utilized cluster co-ordination services through Zookeeper.
- Extensive experience in using the MOM with Active MQ, Apache storm, Apache Spark& Kafka Maven, and Zookeeper.
- Worked on the core and Spark SQL modules of Spark extensively.
- Worked on Descriptive Statistics Using R.
- Developed Kafka producer and consumers, H Base clients, Spark, shark, Streams and Hadoop Map Reduce jobs along with components on HDFS, Hive.
- Analysed the SQL scripts and designed the solution to implement using Py-Spark.
- Experience using Spark with Neo4J where acquiring the interrelated graphical information of the insurer and to query the data from the stored graphs.
- Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts.
Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, OLAP, data modeling, Linux, Hadoop Map Reduce, H Base, Shell Scripting, Mongo DB, and Cassandra, Apache Spark, Neo4J.
Confidential, New York, NY
- Involved in installing Hadoop Ecosystem components and responsible to manage data coming from various sources.
- Involved in HadoopCluster environment administration that includes adding and removing cluster nodes.
- Supported Map Reduce Programs those are running on the cluster and Involved in HDFS maintenance and administering it through Hadoop-JavaAPI.
- Involved in configured Fair Scheduler to provide service-level agreements for multiple users of a cluster.
- Maintaining and monitoring clustersLoaded data into the cluster from dynamically generated files.
- Implemented Flume from relational database management systems using Sqoop.
- Managing nodes on Hadoop cluster connectivity and security.
- Used Pig as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing the data onto HDFS.
- Involved in writing Flume and Hivescripts to extract, transform and load the data into Databasecluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Responsible to load data into SparkRDD and do in memory data Computation to generate the Output response
- Developed Spark code using Java and Spark-SQL/Streaming for faster processing of data
- Responsible to manage data coming from various sources and consolidate it to a JSON File.
- Wrote customs UDF's for HIVE to pull the customized data.
- Experienced on loading and transforming of large sets of structured, semi structured and optimizing of existing algorithms in Hadoop using Spark Context, Hive-SQL, Data Frames.
- Performed advanced procedures like text analytics and processing, using the in-memory computingcapabilities of Spark using Scala.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Experience in writing complex SQL to optimize the hive queries.
- Converted the text files and the csv files to parquet form for the analysis of data.
- Designed the data pipeline from sources to Hadoop.
- Prepared the mapping document, as in which fields has be used from the HIVEDB and perform thetransformations.
- Developed UNIXshell scripts to send a mail notification upon the job completing either with a success orFailure notation.
- Performed the analytics over the data mining, data visualization using Hive.
- The data has been stored in the DB of Snowflake.
- Stored data in S3buckets on AWS cluster on top of Hadoop.
- Effectiveness testing of the customers from the source output database DB.
- Worked in Agile environment and used JIRA as a bug-reporting tool for updating the bug report.
Environment: : Cloudera Hadoop, Linux, AWS, HDFS, Hive, Spark, Sqoop, Flume, Zookeeper, H Base.
Confidential, Kansas City, MO
- Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems followed by Agile.
- Contributing to the development of key data integration and advanced analytics solutions leveraging Apache Hadoop and other big data technologies for leading organizations using major HadoopCloudera Distribution.
- ImplementedAmazon AWS EC2, RDS, S3, RedShift, etc., Tools- Hadoop, Hive, Pig, Sqoop, Oozie, HBase, Flume, Spark.
- Implemented data log directly into HDFS using Flume in Cloudera- CDH and Involve in loading data from LINUX file system to HDFS in Cloudera - CDH.
- Experience in running Hadoop streaming jobs to process terabytes of xml format data and importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP in Cloudera.
- Design and Development of Integration APIs using various Data Structure concepts, Java Collection Framework along with exception handling mechanism to return response within 500ms. Usage of Java Thread concept to handle concurrent request.
- Installed and configured MapReduce, HIVE and the HDFS and Developing Sparkscripts by using Java per the requirement to read/write JSON files. Working on Importing and exporting data into HDFS and Hive using Sqoop.
- Hands on experience on HadoopAdministration, development, NoSQL in in ClouderaLoad and transform large sets of structured, semi structured and unstructured data.
- ImplementedHBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive, Configure and install Hadoop and Hadoop ecosystems (Hive/Pig/ HBase/ Sqoop/ Flume).
- Designed and implemented a distributed data storage system based on HBase and HDFS. Importing and exporting data into HDFS and Hive.
- Design & Implement DataWarehouse creating facts and dimension tables and loading them using Informatica Power Center Tools fetching data from the OLTP system to the Analytics DataWarehouse. Coordinating with business user to gather the new requirements and working with existing issues, worked on reading multiple data formats on HDFS using Scala. Loading data into parquet files by applying transformation using Impala. Executing parameterized Pig, Hive, impala, and UNIX batches in Production.
- Involve in converting Hive/SQL queries into Spark transformations using SparkRDDs and Scala, Analyzed the SQLscripts and designed the solution to implement using Scala.
- Involved in Investigating any issues that would come up. Experienced with solving issues by conducting Root Cause Analysis, Incident Management & Problem Management processes.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Environment: Hadoop 1.x/2.x MR1, Cloudera CDH3U6, HDFS, Spark, Scala, Impala, H Base 0.90.x, Flume, Java, Sqoop, Hive, Tableau.
Confidential, Eden Prairie, MN
- Involved in review of functional and non-functional requirements.
- Facilitated knowledge transfer sessions.
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Experienced in managing and reviewing Hadoop log files.
- Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from various sources.
- Got good experience with NOSQL database such as HBase
- Supported Map Reduce Programs those are running on the cluster.
- Installed and configured Hive and also written HiveUDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- Designed and implemented Mapreduce-based large-scale parallel relation-learning system
- Written the programs in Spark using Scala and used RDD for transformations and performed actions on them.
Environment: Java 6, Eclipse, Oracle 10g, Linux Red Hat. Linux, Map Reduce, HDFS, Hive, Java (JDK 1.6), Map Reduce, Spark, Oracle 11g / 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting.
- Involved in requirements collection & analysis from the business team.
- Created the design documents with use case diagram, class diagram, and the sequence diagrams using rational rose.
- Implemented the MVC architecture using Apache Struts framework.
- Implemented Action Classes and server-side validations for account activity, payment history, and transactions.
- Implemented views using struts tags, JSTL and Expression Language.
- Implemented session beans to handle the business logic for fund transfer, loan, credit card & fixed deposit modules.
- Worked with various Java patterns such as singleton and factory pattern at the business layer for effective objective behaviors.
- Worked on the Java collections API for handling the data objects between the business layers and the front end.
- Developed unit test cases using JUnit.
- Developed ant scripts and developed builds using Apache ANT.
- The used clear case for source code maintenance.
- Worked with business analyst in understanding business requirements, design and development of the project.
- Implemented the JSPframework with MVC architecture.
- Created new JSP's for the front end using HTML, Java Script, Jquery and Ajax.
- Involved in creating Restful web services using JAX RS and JERSEY tool.
- Involved in designing, creating, reviewing Technical Design Documents.
- Developed DAOs (Data Access Object) using Hibernate as ORM to interact with DBMS - Oracle.
- Applied J2EE design patterns like Business Delegate, DAO and Singleton.
- Involved in developing DAO's using JDBC.
- Worked with QA team in preparation and review of test cases.
- Joint was used for unit testing for the integration testing tool.
- Writing SQL queries to fetch the business data using Oracle as database.
- Developed UI for Customer Service Modules and Reports using JSF, JSP's and My Faces Components.
- Log4j used for logging the application log of the running system to trace the errors and certain automated routine functions.