Project Engineer Resume
Irving, TexaS
PROFESSIONAL SUMMARY:
- Expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, H Base, Pig, Sqoop, Spark, Spark SQL, Spark Streaming, and Hive for scalability, distributed computing, and high performance computing.
- Experienced in Installing, Maintaining and Configuring Hadoop Cluster.
- Strong knowledge on creating and monitoring Hadoop clusters on Amazon EC2, VM, Hortonworks Data Platform 2.1 & 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS etc.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Having Good knowledge on Single node and Multi node Cluster Configurations.
- Expertise on Scala Programming language and Spark Core.
- Worked with AWS based data ingestion and transformations.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Good knowledge on Amazon EMR, Amazon RDS S3 Buckets, Dynamo DB, RedShift.
- Analyze data, interpret results, and convey findings in a concise and professional manner
- Partner with Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
- Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
- Very Good understanding of SQL, ETL and Data Warehousing Technologies
- Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server 2000/2005/2008.
- Developed Web-Services module for integration using SOAP and REST.
- Good experience on Kafka and Storm
- Knowledge of java virtual machines (JVM) and multithreaded processing.
- Good exposure to IDE tools like Eclipse Net Beans and I Report.
- Excellent exposure to database designing and modeling using E/R diagrams.
- Java Developer with extensive experience on various Java Libraries, API’s, and frameworks.
- Hands on development experience with RDBMS, including writing complex SQL queries, Stored procedure, and triggers.
- Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle, and SQL Server.
- Experience on using Talend ETL tool.
- Strong in databases like Sybase, DB2, Oracle, MS SQL, Clickstream.
- Strong Working experience in snowflake.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Spark, Splunk, Zookeeper and KafkaNO SQL Database: HBase, Cassandra
Monitoring and Reporting: Tableau, Custom shell scriptsHadoop Distribution: AWS, Horton Works, Cloudera, Map R
Build Tools: SQL Developer
Programming & Scripting: JAVA, SQL, Shell Scripting, Python, Scala
Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Rest services
Databases: Oracle, MY SQL, MS SQL server, Teradata
Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript, angular JS
Version Control: SVN, CVS, GIT
Operating Systems: Linux, Unix, Mac OS-X, Cen OS, Windows10, Windows 8, Windows 7, Windows Server 2008/2003
PROFESSIONAL EXPERIENCE:
Confidential, Irving, Texas
Project Engineer
Responsibilities:
- Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
- Involved in loading data from Oracle database into HDFS using Sqoop queries.
- Implemented Map reduces programs to get Top K Results using Map Reduce programs by fallowing Map Reduce Design Patterns.
- Installed/Configured/Maintained Apache Hadoop clusters for Analytics, application development and Hadoop tools like Hive, HSQL Pig, HBase, OLAP, Zookeeper, Avro, parquet and Sqoop on Linux ARCH.
- Having experience in doing structured modelling on unstructured data models.
- Developed data pipeline using Flume, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in loading the created H Files into HBase for faster access of large customer base without taking Performance hit.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Write test cases, analyze and reporting test results to product teams.
- Worked with AWS data pipeline.
- Hadoop workflow management using Oozie.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs, used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Worked in functional, system, and regression testing activities with agile methodology.
- Worked on Python plugin on MySQL workbench to upload CSV files.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Confidential
Project Engineer
Responsibilities:- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS.
- Assisted with performance tuning and monitoring.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Administrator for Pig, Hive and Hbase installing updates, patches and upgrades.
Confidential
Project Trainee
Responsibilities:- The project is for developing a web based application to eliminate all the paperwork in the hospital and laboratories, reading the data from different instruments and store the data in a relational database and generating business intelligence reports for the management.
- Designed and implemented the training and reports modules of the application using Servlets, JSP and Ajax.
- Developed custom JSP tags for the application.
- Writing queries for fetching and manipulating data using ORM software iBatis.
- Used Quartz schedulers to run the jobs sequentially at given time.
- Implemented design patterns like Filter, Cache Manager and Singleton to improve the performance of the application.
- Implemented the reports module of the application using Jasper Reports to display dynamically generated reports for business intelligence.
- Deployed the application in client's location on Tomcat Server.
Environment: HTML, Java Script, Ajax, Servlets, JSP, iBatis, Tomcat Server, PostgreSQL, Jasper Reports.
