- A diligent engineering professional with substantial experience in software industry comprising of application development and big data engineering for complex business problems involvinglarge scale of data. Aspired to learn and upgrade my skills to build an intuitive architecture that helps organizations effectively to process data for meaningful business insights.
- Solid five years of Experience with Confidential Technologies in Hadoop Ecosystem, Application development, Enhancements and Support for Core JAVA, PL/SQL applications of Logistics domain.
- Around 3+ years of experience in designing big data applications with extensive hands on experience in Hadoop ecosystem components. Led the effort from creating POC, demoing it to stakeholders, design, develop and implementing in production environment.
- Implemented ETL process in Hadoop eco - system using HDFS, Hive, Impala, Pig, Sqoop, Flume and Map Reduce.
- Developed POC pipelines using apache Spark for streaming and micro batch processing. Implemented data processing pipelines using RDD, DataFrames, Datasets, spark SQL, spark streaming, Cloud services in AWS.
- Experience working with JIRA, BitBucket(GIT), Confluence, SVN
- Proficiency in Confidential ’s data migration process, agile and waterfall methodologies, tools and techniques to migrate data from Legacy system to HDFS using Hadoop techniques.
- Designed and implemented data strategy projects to integrate several types of semi-structured third-party data into enterprise system and analytical solutions to improve business performance.
- Sun certified java developer, ITSM and RTB Certified internally within organization
- Develop data pipelines for reading and transforming data in batch streaming mode, Hadoop MapReduce jobs to ingest large amount of data using Cloudera distribution system.
- Load and transform large data sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts
- Implement ETL process for importing and exporting the data using SQOOP between HDFS, UNIX and Relational Database systems.
- Create Sub-Queries for filtering, multiple Join tables for performance tuning and faster execution of data. Develop ETL test scripts based on technical specifications/Data design documents and Source to Target mappings.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Nagios. Worked on various file formats like Sequence file, RC File, AVRO and Parquet file formats.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, and visit duration, most purchased item in Store.
- Export the analyzed data to the relational databases for visualization and to generate reports by our BI team. Managed and reviewed Hadoop log files. Tested raw data and executed performance scripts.
Hadoop Ecosystem: HDFS, Spark, PIG, HIVE, Sqoop, HBASE, Impala, MapReduce and Flume
Languages/IDE’s: Python, Core Java, SQL, PlSql, Scala / Eclipse, Jupyter
Real time Processing: Apache Spark, Oozie
Application Servers: Oracle Application Server, Apache Tomcat, MY SQL Server
Cloud Platform/Distribution: AWS, Cloudera
Interface Tools: WinScp, Putty, EAMS Integration tool, Infor EAMS Upload utility tool
DBMS/NoSQL: Oracle, SQL Server 2005/2008, TOAD, HBASE
Scripting: Shell Scripting
VersionControl/Reporting Tools: Perforce, SVN, GIT, MS Excel, BI Reports, Nagios
Others: JSON, XML, Notepad++,Ultra edits, Oozie
Technologies: HDFS/Hadoop, Pig, Hive, Sqoop, Hbase, python, Shell, Cloudera, Map Reduce, SQL
- Data transformation and development using Hadoop clusters in Cloudera distribution system.
- Responsible for building scalable distributed data solutions using Hadoop, Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Developed Batch modes and Streaming jobs to process and ingest large amount of data in order to maintain data latency.
- Import data using Sqoop from Teradata using Teradata connector.
- Used Apache Hive to read, write and query the Hadoop data in HDFS, Hbase. Implemented Partitioning, Dynamic Partitions and Buckets in Hive.
- Developed various UNIX shell scripts to SFTP data from different environments upon file arrival, schedule, run extraction and Load process.
- Achieved data redundancy by preprocessing of data which involves collecting, formatting, cleaning, aggregation, segregation of large volume of data and finally sampling data from it for performing statistical evaluations further inferred valuable conclusion from data.
- Developed Simple Map Reduce Jobs in python for data cleaning and preprocessing.
- Understanding the data nature from different OLTP systems and designing the ingestion processes for HDFS
- Improving the system performance using Indexing, Partitions, and Bucketing concepts in Hive.
- Integrated and Scheduled several types of Hadoop jobs as well as system specific big data jobs using the in-house scheduler Oozie.
Java Application Development/Support Engineer
- Develop Model View Controller pattern and stored procedures to retrieve the shipment data from the database, modules such as FORD Routing for automating the routing details of international shipments from the web screen to create, update and retrieve the existing details from database.
- Develop front end web forms for SUPER MART to create the PO (purchase order), book shipments, trigger email alerts and generate PDF file capturing the order details and stored in DB.
- Played a key role in Requirements Gathering, Analysis, Estimation preparation, FS(Functional specifications),TS (Technical Specifications) Preparation, Application development and daily support operations, Incident Management, Defect fixing.
- Designing and identifying the business logic components, Handling database and query formation, Development, UAT Support.
- Involved in Unit Testing, Scope Creep, Generate Reports and have been a single point of contact in consistent support and deliverable for client manager.
- Demonstrated creative problem solving and decision making to address complicated issues.
- Part of production deployment activity as a primary contact on server maintenance and outage, Played as application support analyst on major application issues, Legacy bugs and enhancements, worked in Defect analysis and bug fixing.
- Involved in application server monitoring using Nagios, Perforce and UNIX shell
- Played key role as SPOC for DVV and CC audit in project level ASM internal audits.