- Software Developer with 8+ years of Professional experience which includes Analysis Design, Development, integration, Deployment and maintenance of quality software applications using Java/J2EE Technologies and big data Hadoop technologies.
- Above 5+ years of working experiences in data analysis and processing using big data stack.
- Proficiency in Java, Hadoop MapReduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Scala, Python, Kafka, Impala, NoSQL Databasesand AWS.
- High Exposure on BigData technologies and Hadoop ecosystem, In - depth understanding of MapReduce and the Hadoop Infrastructure.
- Excellent knowledge on Hadoop Architecture, ecosystems, MRV1 and MRV2such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce programming paradigm.
- Good exposure on usage of NoSQL databases column-oriented HBase, Cassandra and MongoDB(Document Based DB).
- Extensive experience writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java.
- Strong experience in analyzing large amounts of data sets writing Pig scripts and Hive queries.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
- Experienced in job workflow scheduling and monitoring tools like Oozie, NIFI.
- Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
- Hands on experience in Zookeeper.
- Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using Storm.
- Experience in developing data pipeline using Kafka , Spark and Hive to ingest, transform and analyzing data.
- Used Kafka for log accumulation like gathering physical log documents off servers and placed them in a focal spot like HDFS for handling.
- Excellent Understanding of PYSpark and its benefits in Big Data Analytics.
- Worked with data in multiple file formats including ORC, Text / CSV , AVRO,and Parquet, Developed Spark code using Scala and Spark-SQL for faster data processing.
- Experience in Data modeling and connecting Cassandra from Spark and saving summarized data frame to Cassandra .
- Hands on experience in Stream processing frameworks such as Storm, Spark Streaming .
- Experience in scheduling, distributing and monitoring jobs using Spark core .
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context , Spark-SQL , Data Frame , Pair RDD's , Spark YARN .
- Experienced in Spark Core, Spark RDD, Pair RDD, Spark Deployment Architectures.
- Developed applications using Scala , Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster
- Experience with using Big Data with ETL (Talend).
- Experience using various Hadoop Distributions (Cloudera, Hortonworks, and MapReduce etc.) to fully implement and leverage new Hadoop featuresWorked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.,
- Good Knowledge in Amazon AWS computing like EC2webservices which provides fast and efficient processing of Big Data.
- Experience on working with EMR for data visualization.
- Experience in configuring KERBEROS to the cluster.
- Experience in visualizing the streaming data using ApacheNIFI.
- Experience in using S3 for storage purposes.
- Experienced in working with different scripting technologies like Python, UNIX shell scripts.
- Experience on Source control repositories like SVN, CVS and GIT.
- Adequate knowledge and working experience in Agile&Waterfall methodologies.
- Experience in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, SQL Server, MySQL databases.
- Expertise in design and development of various web and enterprise applications using various technologies like JSP, Servlets, Struts, Hibernate, SpringMVC, JDBC, SpringBoot, JMS, JSF, XML, AJAX, SOAP and RESTful Web Services.
- Experience working with Apache SOLR for indexing and querying.
- Developing and Maintenance the Web Applications using the Web server Tomcat, IBM WebSphere.
- Excellent problem solving, and analytical skills.
- Ability to quickly master new concepts and application
Hadoop technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Apache Nifi, Zookeeper.
NoSQL Databases: MongoDB, HBase, Cassandra
Real time/Stream processing: Apache Spark
Distributed message broker: Apache Kafka
Monitoring and Reporting: Tableau, Zeppelin Note Book
Hadoop Distribution: Cloudera, Horton Works, AWS (EMR)
Build Tools: Maven, SBT
Cloud Technologies: AWS Glacier, S3
Programming & Scripting: JAVA, C, SQL, Shell Scripting, Python
Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Restful services
Databases: Oracle, MY SQL, MS SQL server, Teradata
Tools: & Utilities: Eclipse, Net Beans, SVN, CVS, SOAP UI,MQ explorer, RFH utilJMX explorer, SSRS, Aqua Data Studio, XML Spy, ETL(talend,: pentaho), IntelliJ(Scala)
Confidential - Lockport, IL
Sr. Hadoop/Spark Developer
Roles and Responsibilities:
- Involved in complete implementation SDLC, specialized in writing custom Spark, hive programs.
- Used Spark with Yarn and found resources in MapReduce.
- Exported the analyzed from to the relational databases using Sqoop.
- Experience in developing customized UDFs in java to extend Hive and Pig Latin functionality.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Extensively used Scala for implementing required Spark APIs.
- Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS ).
- Expert in creating and designing data ingest pipeline using technologies such as Apache Spark- Kafka.
- Kafka collects data from various sources in real-time and performs necessary transformations and aggregation on the fly build to the common learner data model and persists the data in S3 Buckets.
- Architected ETL pipelines on AWS cloud using Spark EMR .
- Extensively createdEC2 instances for configuring Spark.
- Used Kafka extensively in gathering and moving log data files from application to a central location in AWS S3.
- Implemented RDDS and Data Frames using spark core in Scala.
- Written SparkSQL for processing on incoming structured data.
- NIFI is used for designing the workflow graphically, Created DAG using NIFI and used for further debugging.
- Developed workflows to cleanse and transform raw data into useful information to load into HDFS and NOSQL database.
- NIFIis implemented using java.
- Extensively worked with Scala / Spark SQ L for data cleansing and generating Data Frames to transform them into row DF’s to populate the aggregate tables in Cassandra .
- Pulling data from Cassandra to S3 buckets.
- Experience in setting S3 life cycle rules.
- Pipelining incoming data from Kafka brokers to S3 directly and running Spark on it.
- Implemented test scripts to support test driven development and continuous integration.
- Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location.
- Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and AWS Glacier for storage and backup.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Storing the final result of table in to Sql Database and that is forwarded to BI team.
- Configured Kerberos for secure communication.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hadoop, MapReduce, HDFS, Hue, Hive, Sqoop, Apache Kafka, Oozie, SQL, Flume, Spark, Cassandra, Scala, Java, AWS (EMR, S3), GitHub.
Confidential - Reston, MD
Sr. Hadoop/Spark developer
Roles and Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop components.
- Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs , Python and Scala .
- Knowledge of architecture and functionality of NOSQL DB like HBase .
- Used S3 for data storage, responsible for handling huge amounts of data.
- Used EMR for data pre-analysis by creating EC2instances.
- Used Kafka for obtaining the near real time data.
- Good experience in writing data ingesters like Sqoop.
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API, Data Frames and Pair RDD's for faster processing of data and created RDD's, Data Frames and datasets.
- Batch-processing is done by using Spark implemented by Scala.
- Extensive data validation using HIVE and also written Hive UDFs.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs .
- Used cloudera for Hadoop deploying for some modules.
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way lots of scripting (python and shell) to provision and spin up virtualized Hadoop clusters.
- Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming .
- Installation and Configuration of other Open Source Software like Pig, Hive, HBase, Flume and Sqoop.
- Collected the logs data from web servers and integrated in to HDFS using Flume .
- Created external tables pointing to HBase to access table with huge number of columns.
- Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Configured TALEND ETL tool for some data filtering,
- Processed the data in HBase using Apache Crunch pipelines, a map-reduce programming model which is efficient for processing AVRO data formats.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume .
- Used Tableau for data visualization and generating reports.
- Configured Kerberos for the clusters.
- Used Apache SOLR for indexing in HDFS.
- Integration with RDBMS using Sqoop and JDBC Connectors.
- Used different file formats like Text files, Sequence Files, Avro , and CSV.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
Environment: UNIX, Linux Java, Apache HDFS Map Reduce, Spark, Pig, Hive, HBase, Flume, Sqoop, NOSQL, AWS (S3 buckets), EMR cluster, SOLR.
Confidential - West Des Moines, IA
Roles and Responsibilities:
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Importing the data from the MySQL into the HDFS using Sqoop.
- Importing the unstructured data into the HDFS using Flume.Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Written Map Reduce java programs to analyze the log data for large-scale data sets.
- Involved in using HBase Java API on Java application.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushingthe result set data to Hadoop Distributed File System.
- Customize parser loader application of Data migration to HBase.
- Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
- Developed custom UDFS and implemented Pig scripts.
- Implemented MapReduce jobs using Java API and PIG Latin as well HIVEQL
- Participated in the setup and deployment of Hadoop cluster
- Hands on design and development of an application using Hive (UDF).
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
- Upon the HCatalog tables we use pig and hive to analyze the data and also create schema for the HBase table in Hive. Configured HA cluster for both Manual failover and Automatic failover.
- Designed and built many applications to deal with vast amounts of data flowing through multiple
- Hadoop clusters, using Pig Latin and Java-based map-reduce.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing log files.
- Created a SOLR schema from the Indexer settings.
- Experience in writing SOLR queries for various search documents
- Responsible for defining the data flow within Hadoop eco system and direct the team in Developed HIVE queries for the analysts.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Horton Works, Hadoop, HDFS, Oozie, Pig, Hive, MapReduce, Sqoop, SOLR, HBase and Linux
Confidential - Springfield, IL
Big Data/Hadoop Developer
Roles and Responsibilities:
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig.
- Used Cloudera Quickstart vm for deploying the cluster.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Mentored analyst and test team for writing Hive Queries.
- Analyzed the data by performing Hive queries and running Pig scripts to validate data.
- Generated the datasets and loaded to HADOOP Ecosystem.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.
- Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
- Used Sqoop, Pig, Hive as ETL tools for pulling and transforming data.
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality
- Worked with MongoDB for developing and implementing programs in Hadoop Environment.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi-structured and unstructured data.
- Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMSand JBoss.
Environment: Cloudera, Hadoop, HDFS, Spark, Oozie, Pig, Hive, MapReduce, Sqoop, MongoDBLinux, Core Java, SOAP, XML, JMS, JBOSS.
Roles and Responsibilities:
- Developed user interfaces using JSP, Struts Custom tags and HTML.
- Implemented Model View Controller (MVC) architecture using Struts Framework.
- Used Struts Frameworkfor Dependency injection and integrated with Hibernate.
- Used XML parser APIs such as JAXB in the web service’s request/response data for marshalling and Unmarshalling.
- Developed Helper Classes to Validate data and against a Set of Business Rules.
- Implemented the data persistence functionality of the application by using JPA to persist Java objects with the Oracle database.
- Implemented Messaging using JMS and Message Driven Beans.
- Used XML Web Services using SOAP to get the credit based insurance score based on the information contained in the credit report obtained from an authentic credit bureau.
- Extensively used Eclipse for writing code.
- Used Log4j for logging, debugging and used Junit extensively for testing.
- Used CVS for version control.
- Used WebLogic Application Server for deploying various components of application.
Environment: Java, Struts, Hibernate, JSP, SOAP, CVS, WebLogic, Oracle, Maven, Log4j, Sql Developer, Jira, Eclipse.
Roles and Responsibilities:
- Analyzed, Designed and developed the system to meet the requirements of business users.
- Participated in the design review of the system to perform Object Analysis and provide best possible solutions for the application
- Implemented presentation tier using HTML, JSP, Servlets, AJAX frameworks.
- Used AJAX for implementing part of the functionality for Customer Registration, View Customer information modules.
- Implemented Struts MVC framework for developing J2EE based web application.
- Used JDBC to connect and access database.
- IBM WebSphere to deploy J2EE application components
- Database tier involved the SQL Server.
- Developed JUnit test cases.