- Over 8+ years of IT experience in software analysis, design, development, testing and implementation of Big Data, Hadoop, No SQL technologies.
- Skilled experience in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Sqoop, Flume, Yarn, Spark, Kafka and Oozie.
- Strong understanding of Hadoop daemons and Map - Reduce concepts.
- Strong experience in importing-exporting data into HDFS format.
- Expertise in Java and Scala
- Experienced in developing UDFs for Hive using Java.
- Worked with Apache Falcon which is a data governance engine that defines, schedules, and monitors data management policies.
- Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, RedShift, and Dynamo DB which provides fast and efficient processing of Big Data.
- Hands on experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Oozie, Flume and HBase).
- Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Sparkcore, Spark Streaming and Spark SQL.
- Strong understanding and strong knowledge in NoSQL databases like HBase, MongoDB&Cassandra.
- Experience in working with Angular 4, Nodejs, Bookshelf, Knex, and Maria DB.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
- Good skills in developing reusable solution to maintain proper coding standard across different java project.
- Good knowledge on PythonCollections, PythonScripting and Multi-Threading.
- Written multiple Map Reduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file EJB, Hibernate, JavaWeb Service, SOAP, REST Services, Java Thread, Java Socket, Java Servlet, JSP, JDBC formats.
- Proficient in R Programming Language, Data extraction, Data cleaning, Data Loading, Data Transformation, and Data visualization.
- Used Pandas, Numpy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression.
- Expertise in debugging and optimizing Oracle and java performance tuning with strong knowledge in Oracle 11g and SQL
- Ability to work effectively in cross-functional team environments and experience of providing training to business users.
- Good experience in using Sqoop for traditional RDBMSdata pull.
- Good working knowledge of Flume.
- Worked with Apache Ranger console to create and manage policies for access to files, folders, databases, tables, or columns.
- Worked with YarnQueue Manager to allocate queue capacities for different service accounts.
- Hands on experience on Hortonworks and ClouderaHadoop environments.
- Familiar with handling complex data processing jobs using Cascading.
- Strong database skills in IBM DB2, Oracle andProficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
- Extensive experience in Shell scripting.
- Leading the testing efforts in support of projects/programs across a large landscape of technologies ( Unix, Angular JS, AWS, sauseLABS, Cucumber JVM, MongoDB, GITHub, SQL, NoSQL database, API, Java, Jenkins)
- Testing automation by using Cucumber JVM to develop a world class ATDD process.
- Setup JDBC connection for database testing using cucumberframework.
- Experience in component design using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
- Expertise in installation, configuration, supporting and managing HadoopClusters using Apache, Cloudera (CDH3, CDH4) distributions, Hortonworks and on Amazonweb services (AWS).
- Excellent analytical and programming abilities in using technology to create flexible and maintainable solutions for complex development problems.
- Good communication and presentation skills, willing to learn, adapt to new technologies and third party products.
Languages/Tools: Java, C, C++, C#,Scala, VB, XML, HTML/XHTML, HDML, DHTML.
Big Data: HDFS, MapReduce, HIVE, PIG, HBase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Cassandra, Solr, Impala,Greenplum, MongoDB
Web/Distributed Technologies: J2EE, Servlets 2.1/2.2, JSP 2.0, Struts 1.1, Hibernate 3.0, JSF, JSTL1.1,EJB 1.1/2.0, RMI,JNI, XML,JAXP,XSL,XSLT, UML, MVC,STRUTS,Spring 2.0, Corba, Java Threads.
Browser Languages/Scripting: HTML, XHTML, CSS, XML, XSL, XSD, XSLT, Java script, HTML DOM, DHTML, AJAX.
App/Web Servers: IBM Websphere … BEA Web logic 5.1/7.0, Jdeveloper, Apache Tomcat, JBoss.
GUI Environment: Swing, AWT, Applets.
Messaging & Web Services Technology: SOAP, WSDL,UDDI, XML, SOA, JAX-RPC, IBM WebSphere MQ v5.3, JMS.
Testing &Case Tools: Junit, Log4j, Rational Clear case, CVS, ANT, Maven, JBuilder.
Configuration Management: Chef, Puppet, Ansible, Docker.
Build Tools: CVS, Subversion, GIT, Ant, Maven, Gradle, Hudson, TeamCity, Jenkins, Chef, Puppet, Ansible, Docker.
CI Tools: Jenkins, Bamboo.
Scripting Languages: Python, Shell (Bash), Perl, Power Shell, Ruby, Groovy, Power Shell.
Monitoring Tools: Nagios, Cloud Watch, JIIRA, Bugzilla and Remedy.
Databases: NO SQL Oracle, MS SQL Server 2000, DB2, MS Access &MySQL.Teradata, Cassandra, Greenplum and MongoDB
Operating systems: Windows, Solaris, Unix, Linux (Red Hat 5.x, 6.x, 7.x'SUSELinux 10), Sun Solaris, Ubuntu, CentOS.
Confidential, New Jersey
Big Data Developer
- As a Big Data/Hadoop Developer worked on Hadoopecho-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
- Developed Big Data solutions focused on pattern matching and predictive modelling
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Worked on MongoDB, HBase databases which differ from classic relational databases
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Maintain and work with our data pipeline that transfers and processes several terabytes of data using Spark, Scala, Python, Apache Kafka, Pig/Hive & Impala
- Integrated Kafka-Sparkstreaming for high efficiency throughput and reliability
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
- Leading large scale Big data projects from inception to completion using Python, Scala,Spark and Hive.
- Worked in tuning Hive&Pig to improve performance and solved performance issues in both scripts
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Developed Nifi flows dealing with various kinds of data formats such as XML, JSON, and Avro.
- Developed and designed data integration and migration solutions in Azure.
- Worked on Proof of concept with Spark with Scala and Kafka.
- Worked on visualizing the aggregated datasets in Tableau.
- Worked on importing data from HDFS to MYSQL database and vice-verso using SQOOP.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Configured Hive Meta store with MySQL, which stores the meta data for Hive tables.
- Performance tuning of Hive queries, Map Reduce programs for different applications.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Developed Spark code using Scala and Spark- SQL /Streaming for faster testing and processing of data.
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
- Involved in identifying job dependencies to design work flow for Oozie&YARN resource management.
- Designed solution for various system components using MicrosoftAzure.
- Worked on data ingestion using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
- Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Developed data pipeline using Flume, Sqoop, Pig and JavaMap Reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
- Primarily involved in Data Migration process using Azure by integrating with Github repository and Jenkins.
- Upgraded the HadoopCluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
- Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
- Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
- Worked on analyzing Hadoop cluster and different big dataanalytic tools including Pig, HBase database and Sqoop.
Environment: Agile, Hadoop, Pig, HBase, Sqoop, Azure, Hive, HDFS, NoSQL, Impala, YARN, PL/SQL, Nifi, XML, JSON, Avro, Spark Kafka, Tableau, MySQL, Apache Flume.
Confidential Overland park, Kansas
Big Data Developer
- Implementation of Big Data eco system (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud.
- Developed Big Data solutions focused on pattern matching and predictive modeling. utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive
- Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
- Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop)
- Configured Performance Tuning and Monitoring for CassandraRead and Write processes for fast I/O operations and low latency time.
- Worked using ApacheHadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
- Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark,HBase, Kafka, Elastic Search, database and SQOOP.
- Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Troubleshooting, debugging & altering Talend issues, while maintaining the health and performance of the ETL environment.
- Performed data profiling and transformation on the raw data using Pig, Python.
- Experienced with batch processing of data sources using Apache Spark.
- Developing predictive analytic using Apache Spark Scala APIs.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Creating dashboard on Tableu and Elastic search with Kibana.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (cassandra)
- Experience on BI reporting with At ScaleOLAP for Big Data.
- Responsible for importing log files from various sources into HDFS using Flume
- Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Loading data from different source (database & files) into Hive using Talend tool.
- Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Used Spark SQL to process the huge amount of structured data.
- Implemented SparkGraphX application to analyze guest behavior for data science segments.
- Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytic s and Reporting using Tableau.
Confidential - Bronx, NY
- Executed Hive queries that helped in analysis of market trends by comparing the new data with EDW reference tables and historical data.
- Managed and reviewed Hadoop log files job tracker, NameNode, secondary NameNode, data node, and task tracker.
- Tested raw market data and executed performance scripts on data to reduce the runtime.
- Involved in loading the created Files into HBase for faster access of large sets of customer data without affecting the performance.
- Importing and exporting the data from HDFS to RDBMS using Sqoop and Kafka.
- Executed the Hive jobs to parse the logs and structure them in relational format to provide effective queries on the log data.
- Created Hive tables (Internal/external) for loading data and have written queries that will run internally in MapReduce and queries to process the data.
- Developed PigScripts for capturing data change and record processing between new data and already existed data in HDFS.
- Creating scalable perform ant machine learning applications using the Mahout.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Involved in importing of data from different data sources, and performed various queries using Hive,MapReduce, and PigLatin.
- Involved in loading data from local file system to HDFS using HDFSShell commands.
- Experience on UNIXshellscripts for process and loading data from various interfaces to HDFS.
- Develop different components of Hadoop ecosystem system process that involves Map Reduce, and Hive.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Big Data, Java, Flume, Kafka, Yarn, HBase, Kafka Oozie, Java, SQL scripting, Linux shell scripting, Mahout, Eclipse and Cloudera.
Confidential -Calverton, MD
- Worked on Distributed/Cloud Computing (Map Reduce/ Hadoop, Hive, Pig, HBase, Sqoop, Spark AVRO, Zookeeper etc.), Cloudera distributed Hadoop (CDH4)
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
- Involved in installing Hadoop Ecosystem components.
- Importing and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
- Responsible to manage data coming from different sources.
- Flume and from relational database management systems using SQOOP.
- Responsible to manage data coming from different data sources.
- Involved in gathering the requirements, designing, development and testing.
- Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
- Developed simple and complex Map Reduce programs in Java for Data Analysis.
- Load data from various data sources into HDFS using Flume.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Worked on Hue interface for querying the data.
- Created Hive tables to store the processed results in a tabular format.
- Developed Hive Scripts for implementing dynamic Partitions.
- Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
- Extensive knowledge on PIG scripts using bags and tuples.
- Experience in managing and reviewing Hadoop log files.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team.
Environment: Hadoop (CDH4), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, HBase, Oozie, SQOOP and MySQL.
- Actively involved in Normalization &D-normalization of database.
- Installed and Configured SQL Server 2000 on servers for designing and testing.
- Designed DDL and DML for MS SQL Server 2000/2005.
- Using SQL Server Integration Services (SSIS) to populate data from various data sources
- Developed web based front-end screens using MS FrontPage, HTML and Java Script.
- Actively designed the database to fasten certain daily jobs, stored procedures.
- Optimized query performance by creating indexes.
- Involved in writing SQL batch scripts.
- Created scripts for tables, stored procedures, and DTS and SSIS,
- Writing T-SQL statements for retrieval of data and Involved in performance tuning of TSQL
- Involved in merging existing databases and designed new data models to meet the requirements.
- Create joins and sub-queries for complex queries involving multiple tables.
- Used DML for writing triggers, stored procedures, and data manipulation.
- Taking Database full Backup, Transaction log backup & differential backup in daily Routine
- Monitor production Server Activity
- Worked with DTS packages to load the massaged data into Datawarehousing system
- Tuned the SQL queries using SQL profiler and involved in tuning the database
- Very pro actively identifying the problems before user complaints
Environment: Windows 2003 server, SQL Server 2000/2005, SSIS, FrontPage, IIS 5.0
- Worked on developing a functional SQL database system that stored customer financial records
- Requirements were given through an excel spreadsheet that I had to configure and develop into the format of an SQL database
- Had to implement several mathematical functionalities into the database that could be called upon at any time by the user
- From the developed database I was tasked to design a user-friendly interface through Java JDBC, that could be easily understood by workers when manipulating customer's financial data
- Was able to gain experience of working with the customer throughout the development life cycle allowing me to focus on customer satisfaction based on their needs/wants