Hadoop Developer Resume
Newark, CA
SUMMARY:
- Over 7+ years of experience in IT and 4+ years of experience Hadoop eco systems and Java technologies like HDFS, MapReduce, Apache Pig, Hive, Hbase, and Sqoop.
- In depth knowledge of Hadoop Architecture and Hadoop daemons such as Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker.
- Experience in writing Map Reduce programs using Apache Hadoop for analyzing Big Data.
- Hands on experience in writing Ad - hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
- Experience in importing and exporting data using SQOOP from Relational Database Systems to HDFS.
- Experience in writing Hadoop Jobs for analyzing data using Pig Latin Commands.
- Good Knowledge of analyzing data in HBase using Hive and Pig.
- Working Knowledge in NoSQL Databases like HBase and Cassandra.
- Good Knowledge in Amazon AWS concepts like EMR, EC2, EBS, S3 and RDS web services which provides fast and efficient processing of Big Data.
- Experience in Integrating BI tools like Tableau and pulling required data to in-memory of BI tool.
- Experience in Launching EC2 instances in Amazon EMR using Console.
- Extending Hive and PIG core functionality by writing custom UDFs like UDAFs and UDTFs.
- Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig in Distributed Mode.
- Experience in using Apache Flume for collecting, aggregating and moving large amounts of data from application servers.
- Passionate towards working in Big Data and Analytics environment.
- Knowledge on Reporting tools like Tableau which is used to do analytics on data in cloud.
- Extensive experience with SQL, PL/SQL, Shell Scripting and database concepts.
- Experience with front end technologies like HTML, CSS and JavaScript.
- Experience in working with Windows, UNIX/LINUX platform with different technologies such as Big Data,SQL, XML, HTML, Core Java, Shell Scripting etc.
TECHNICAL SKILLS:
Database: DB2, MySQL, Oracle, MS SQL Server
Languages: Core Java, PIG Latin, SQL, Hive QL, Shell Scripting and XML
API s/Tools: NetBeans, Eclipse, MYSQL workbench, Visual Studio
Web Technologies: HTML, XML, JavaScript, CSS
BigData Ecosystem: HDFS, PIG, MAPREDUCE, HIVE, SQOOP, FLUME, HBase
Operating System: Unix, Linux, Windows XP
Visualization Tools: Tableau, Zeppelin
Virtualization Software: VMware, Oracle Virtual Box.
Cloud Computing Services: AWS (Amazon Web Services).
PROFESSIONAL EXPERIENCE:
Confidential, PortlandOR
Hadoop Developer
Responsibilities:
- Analyzing the requirement to setup a cluster.
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Created Hive queries to compare the raw data with EDW reference tables and performing aggregates
- Importing and exporting data into HDFS and Hive using SQOOP.
- Writing PIG scripts to process the data.
- Developed and designed Hadoop, Spark and Java components.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
- Developed Unix/Linux Shell Scripts and PL/SQL procedures.
- Installed and configured Hive and written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries using the HIVEQL whichwill run internally in MAPREDUCE way.
- Loaded some of the data into Cassandra for fast retrieval of data.
- Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports by our BI team.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Implementation of Big Data solutions on the Hortonworks distribution and AWS Cloud platform.
- Developed Pig Latin scripts for handling data formation.
- Extracted the data from MySQL into HDFS using SQOOP.
- Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
Environment: Hadoop, Cloudera distribution, Hortonworks distribution, AWS, EMR, Azure cloud platform, HDFS, MapReduce, DocumentDB Unix Shell Scripting, Pig, Hive, Sqoop, Flume, Oozie, Zoo keeper, Core Java, impala, HiveQL,Spark, UNIX/Linux Shell Scripting.
Confidential, Newark, CA
Big Data Developer
Reponsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from LINUX file system to HDFS.
- Working experience in HDFS Admin Shell commands.
- Experience in ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
- Involved in decreasing application run times significantly by using performance tuning procedures of Abinitio.
- Understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node and Data Node concepts.
- Experience on different Hadoop distribution Systems such as: Cloudera & Hortonworks
- Hands on experience on Cassandra DB.
- Developed Abinitio graphs for batch processing.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Hands on using SQOOP to import and export data into HDFS from RDBMS and vice-versa.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Used SQOOP, AVRO, HIVE, PIG, Java, MAPREDUCE daily to develop ETL, Batch Processing and data storage functionality.
- Supported implementation and execution of MAPREDUCE programs in a cluster environment.
Environment: Hadoop,MapReduce,Hive,Pig,Hbase,Sqoop, Abinitio, Cassandra, Flume, Java, SQL, Cloudera Manager, Eclipse, Unix Script, YARN.
Confidential - San Francisco, CA
Hadoop Engineer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Worked hands on with ETL process. Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Extracted the data from Teradata into hdfs using the Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to know User behavior like Shopping Enthusiasts, Travelers, Auto Intenders, and Music Lovers.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Developed the Pig UDF’S to pre-process the data for analysis.
Environment: Cloudera Manager, Hive, Java, Pig, Sqoop, Oozie, Linux
Confidential
Java Developer
Reponsibilities:
- Created design documents and reviewed with team in addition to assisting the business analyst / project manager in explanations to line of business.
- Responsible for understanding the scope of the project and requirement gathering.
- Involved in analysis, design, construction and testing of the application
- Developed the web tier using JSP to show account details and summary.
- Designed and developed the UI using JSP, HTML, CSS and JavaScript.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for JUnit Testing.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Developed application using Eclipse and used build and deploy tool as Maven.
Environment: Java, J2EE Servlet, JSP, JUnit, AJAX, XML, JavaScript, Maven, Eclipse, Apache Tomcat, and Oracle.