- Professional experience of 8+ years in IT which includes 4 years of comprehensive experience in working with Apache Hadoop Ecosystem - components, Spark streaming.
- Over 4+ Years of development experience in Big Data Hadoop Ecosystemcomponentsand related tools with data ingestion, importing, exporting, storage, querying, pre-processing and analysing of big data.
- Good Working Expertise on handling Terabytes of structured and unstructured data onhuge Cluster environment.
- Experience in using SDLC methodologies like Waterfall, Agile Scrum, and TDD for design and development.
- Expertise in implementing Spark modules and tuning its performance.
- Expertise in Kerberos Security Implementation and securing the cluster.
- Expertise in creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in Havel and experience in data transformation & file processing, building analytics using Pig Latin Scripts.
- Expertise in writing custom UDFs in Pig & Hive Core Functionality.
- Developed, deployed and supported several Map Reduce applications in Java to handle different types of data.
- Worked with various compression techniques like Avro, Snappy, and LZO.
- Hands on experience dealing withAVRO and Parquet file format, following best Practices and improving the performance using Partitioning, Bucketing, and Map side-joinsand creating Indexes.
- Expert in implementing advanced procedures like text analytics, processing and implementing streaming APIs using the in-memory computing capabilities like Apache Spark written in Scala, Python and Scala.
- Experience in Data Load Management, importing and exporting data from HDFS to Relational and non- Relational Database Systems using Sqoop, Flumeand Apache Nifiby efficient column mappings and maintaining the uniformity.
- Exported data to various Databases like Teradata (Sales Data Warehouse), SQL-Server, Cassandra using Sqoop.
- Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the HDFS.
- Experienced in performing code reviews, involved closely in smoke testing sessions, retrospective sessions.
- Experience in scheduling and monitoring jobs using Oozie and Crontab.
- Experienced in Microsoft Business Intelligence tools, developing SSIS (Integration Service), SSAS (Analysis Service)and SSRS (Reporting Service), building Key Performance Indicators and OLAP cubes.
- Have hands on experience in creating cubes for various reports for end clients, Configured Data Source and Data Source Views,Dimensions, Cubes, Measures, Partitions, KPI’s and MDX Queries.
- Have good exposure with the star, snow flake schema, data modelling and work with different data warehouse projects.
- Hands on working with the reporting tool Tableau, creating dashboards attractive dashboards and worksheets.
- Extensive work experience with Java/J2EE technologies such as Servlets, JSP, EJB, JDBC, JSF, Struts, spring,SOA, AJAX, XML/XSL, Web Services (REST, SOAP), UML, Design Patterns and XML Schemas.
- Strong experience in design and development of relational database concepts with multiple RDBMS databasesincludingOracle10g, MySQL, MS SQL Server & PL/SQL.
- Experience in JAVA, J2EE, WEB SERVICES, SOAP, HTML and XML related technologies.
- Have closely worked with the technical teams, business teams and product owners.
- Strong analytical and problem-solving skills and ability to follow through with projects from inception to completion.
- Ability to work effectively in cross-functional team environments, excellent communication and interpersonal skills.
Hadoop/BigData Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Ambary, Storm, Spark and Kafka
No SQL Database: HBase, Cassandra, MongoDB
Monitoring and Reporting: Tableau, Custom Shell Scripts
Hadoop Distribution: Horton Works, Cloudera, MapR
Build Tools: Maven, SQL Developer
Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/REST services
Databases: Oracle, MY SQL, MS SQL server, Vertica, Teradata
Analytics Tools: Tableau, Microsoft SSIS, SSAS and SSRS
IDE Dev. Tools: Eclipse 3.5, Net Beans, My Eclipse, Oracle, JDeveloper 10.1.3, SOAP UI, Ant, Maven, RAD
Operating Systems: Linux, Unix, Windows 8, Windows 7, Windows Server 2008/2003
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Ambary, Storm, Spark and Kafka, Apache Nifi
Network protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Sr Hadoop Spark Developer
- Worked on different file formats like Sequence, XML, JSONfiles and Map files using Map ReducePrograms.
- Took important decisions of how much the poll time should be for the stream processing, what type of Hadoopstack component to use for better performance.
- Proposed and implemented a solution for their long-time issue of ordering the data in Kafka queues.
- Designed and implemented an ETL framework with the help of sqoop, pig and hive to be able to automate the process of frequently bringing in data from the source and make it available for consumption.
- Worked onimporting and exporting data into HDFS and Hive using Sqoop, built analytics on Hive tables using HiveContext.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Load and transform large sets of semi-structured and unstructureddata on HBase and Hive.
- ImplementedMap-ReduceprogrammingmodelwithXML, JSON,andCSVfileformats.Made use of SERDE jars to load json and xml format dataonto Hive tables coming from Kafka queues.
- Implemented UDFs for Hive extending Generic UDF, UDTF and UDAF base classes to change the time zones implement logic actions and extract required parameters according to the business specification.
- Extensive working knowledge of Partitioning, UDFs, Performance tuning, Compression-related properties on Hivetables.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Implemented Spark scripts in Python to perform extraction of required data from the data sets and storing it on HDFS.
- Developed spark scripts and python functions that involve performing transformations and actions on data sets.
- ConfiguringSparkStreaming in PythontoreceiverealtimedatafromtheKafkaandstoreitontoHDFS.
- Experienced in building analytics on top of spark using machine learningSpark.ml.
- Involved in optimizing the Hive queries using Map-side join, Partitioning, Bucketing and Indexing.
- Involved in tuning the Spark modules with various memory and resource allocation parameters, setting right Batch Interval time and varying the number of executorsto meet the increasing load overtime.
- Continuously monitored and managed the Hadoop cluster using Cloudera Manager.
- Used Hue for UI based PIG script execution, Tidalscheduling and creating tables in Hive.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wisedata.
- Involved in planning process of iterations under the Agile Scrum methodology.
- Extensive knowledge of working on Apache NiFi, used and configured of different processors to pre-process, make the incoming data uniform and format according to the requirement.
- Implemented unit testing in Java for pig and hive applications.
- Hands on experience in AWS Cloud in various AWS services such as Red shift cluster, Route 53domain configuration.
- Extensively used UNIX for shell Scripting and pulling the Logs from the Server and monitor it.
- Worked with the Data Science team to gather requirements for various data miningprojects.
Sr. Hadoop Developer
- Worked on extracting data from Oracle database and load to Hivedatabase.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly from Kafkaqueues in real time and persist on Cassandrausing the required connectors and drivers.
- Integrated Kafka, Spark and Cassandra for streamline analytics for creating a predictive model.
- Developed Scala scripts, UDFs using both Data frames in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Worked on modifying and executing the UNIX shell scripts files for processing data and loading to HDFS.
- Worked extensively on optimizing transformations for betterperformance.
- Was involved in carrying out the important design decisions in creating UDFs, partitioning the data in hive tables at two different levels based on the related columns for efficient retrieval and processing of queries.
- Tweaked lot of options to get performance boost like trying it out with different executer count and memoryoptions.
- My team was also involved in maintenance, adding the feature of stable time zones across all records in the database.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured, heterogeneous sources into HDFS file systemusingSqoop and Flumeenforcing and maintaining the uniformity across all the tables.
- Developed complex transformations using HiveQL to build aggregate/summarytables.
- Used Solr/Lucene for indexing and querying the JSON formatted data.
- Implemented unit tests for pig and hive applications.
- Developed UDF's in Python to implement functions according to the specifications.
- Developed Spark scripts, configured according to business logic, good knowledge of actionsavailable.
- Well versed with the HL7 international standards as the data was organized according to this format.
- Formatted and built analytics on top of the data sets that were complied with HL7 standards.
- Created UDFs in scala, java for formatting and applying transformations on the information in HL7 versions.
- Analyze the JSON data using hive SerDe API to deserialize and convert into readableformat.
- Involved in increasing and optimizing the performance of the application using Partitioning and Bucketing on Hive tables, developing efficient queries by using Map-side joins and Indexes.
- Worked with downstream team in generating the reports on Tableau.
Environment: CDH 5.1.x, Hadoop 2.2.0, HDFS, Map Reduce, Sqoop, Flume, Hive 2.0.x, SQL Server, TOAD, Oracle, Scala 2.9.1,Solr/Lucene, PL/SQL, Eclipse, JAVA, Shell scripting, Vertica, Unix, Cassandra, HL7 standard.
Sr. Hadoop Developer
- Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems.
- Developed MapReduce programs in Python using Hadoop streaming API to parse the raw data, populate staging tables and store therefined data in partitionedHIVEtables.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop DistributedFile System and Pig to pre-process thedata.
- Convertedapplications which was on map-reduce architecture to Spark using Python API which performed thebusinesslogic.
- Involved in creating Hive tables, loading with data, writing hive queries that will run internally in map reduce way.
- Imported Teradata datasets onto the HIVE platform using Teradata JDBC connectors.
- Was involved in writing FastLoad and MultiLoad scripts to load the tables
- Worked with different types of Indexes and Collect Statistics in Teradata and improving of execution strategy.
- Worked with theSQL assistant and BTEQtoingest and execute queries, stored procedures and update the tables.
- Worked in extracting XML type files using XPath and storing it onto to Hive tables.
- Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
- Involved in designing the tables in Teradata while importing the data.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Experienced in managing and reviewing the Hadoop log files.
- Developed Hive jobs to parse the logs, structure them in tabular format to facilitate effective querying on the log data.
- Extensively used UNIX for shell Scripting and pulling the Logs from the Server.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Completed testing of integration and tracked and solved defects.
- Used AWSservices like EC2 and S3 for small data sets.
Environment: Hadoop Hortonworks2.2, Hive, Pig, HBASE, Sqoop and Flume, Oozie, AWS, S3, EC2, EMR Spring,Kafka,SQL Assistant, Python,UNIX, Teradata
Jr. Hadoop Developer
- Supported and monitored MapReduce Programs running on thecluster.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines to developprograms.
- Configured the Hadoop cluster with Name-node, Job-tracker, and Task-trackers onslavenodes and formatted HDFS.
- Used Oozie workflow engine to schedule and execute multiple Hive,Pigand Spark jobs by passing arguments.
- Involved in creating Hive Tables, loading data and writing Hive queries to invoke and run Map Reduce jobs in thebackend.
- Designed and implemented Incremental Imports into Hive tables using Sqoop and cleaning the staging tables and files.
- Involved in collecting, pre-processing,aggregating and moving data from servers to HDFS using Apache Flume.
- Developed multiple MapReduce jobs in python for data cleaning andpreprocessing.
- Exported the result set from Hive to MySQL using Sqoop after processing thedata.
- Analyzed the data by performing Hive queries and running Pig scripts to study customerbehavior.
- Have hands on experience working on Sequence files, AVRO, HAR file formats andcompression.
- Used Partitioning and Bucketing in Hive tablesto increase the performance and parallelism.
- Experience in writing MapReduce programs in pythonto cleanse Structured and unstructureddata.
- Wrote Pig Scripts to perform ETL procedures on the data inHDFS.
- Implemented Hive and Pig scripts to analyze large data sets.
- Loaded and transformed large sets of structured, semi structured and unstructured data onto HBasecolumn families.
- Created HBase tables to store data coming from differentportfolios and created Bloom filters on column families for efficiency.
- Worked on improving the performance of existing Pig and HiveQueries by optimization.
- Analyzed the partitioned and bucketed data and compute various metrics forreporting.
- Deployed and worked with Apache Solr search engine server to help speed up the search of the sales and production data.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the data pertaining to different states across northern-USA.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig Latinscriptsand store these intermediate results for further analytics.
- Migrated ETL jobs to Pig scripts to perform the Transformations, joins and pre-aggregations before storing data onto HDFS.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop andFlume.
- Worked on loading the data from MySQL to HBase where necessary usingSqoop
- Co-ordinate with offshore and onsite team to understand the requirements to propagate and prepare designdocuments from the requirements specification and architectural designs.
- Worked with application teams to install Operating Systems, Hadoop updates, patches, and version upgrades as required.
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class models, Sequence diagrams, and Activity diagrams for SDLC process of theapplication.
- Configured the project on WebSphere 6.1 applicationservers
- Implemented the online application by using Core Java, Jdbc, JSP, Servlets and EJB 1.1, WebServices, SOAP,WSDL
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements for the front-end modules.
- Performed live demos of functional and technical reviews.
- Maintenance in the testing team for System testing/Integration/UAT.
- Guaranteeing quality in the deliverables to the product owners and business team.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Implemented Action Classes and Server-side validations for account activity, registration and Transaction’s history.
- Designed user-friendly GUI interface and Web pages using HTML, CSS, Struts, JSP.
- Involved in writing Client-Side Scripts using Java Scripts and Server-Side scripts using Java Beans.
- Involved in various stages of Enhancements in the Application by doing the required analysis, development, andtesting.
- Prepared the High and Low-level design document and Generating DigitalSignature
- For analysis and design of application created Use Cases, Class and SequenceDiagrams.
- For the registration and validation of the enrolling customer developed logic andcode.
- Developed web-based user interfaces using struts framework.
- Coded and deployed JDBC connectivity in the Servlets to access the Oracle database tables on Tomcat web-server.
- Involved in integration of various Struts actions in theframework.
- Used Validation Framework for Server-sideValidations
- Created test cases for the Unit and Integrationtesting.