Hadoop Developer/admin Resume
Boston, MA
SUMMARY
- Have more than 8+ years of experience with Hadoop stack on CCDH and passionate towards working in Big Data and Analytics environment.
- Good experience in application and product development using full SDLC primarily using Hadoop, Java, Mainframe and ETL Technologies and worked on data analysis
- Proven skills in establishing strategic direction yet technically strong in designing, implementing, and deploying. Collected/translated business requirements into distributed architecture & robust scalable designs.
- Experience in working with Map Reduce programs using Apache Hadoop for working with Big Data.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts
- Experience in using Pig, Hive, Scoop, HBase and Cloudera VM.
- Extensive experience with ETL and Query big data tools like Pig Latin and Hive QL.
- Worked on Kafka messaging system, able to ingest from Kafka to Spark.
- Hands on experience in big data ingestion tools like Flume and Sqoop
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experienced the integration of various data sources like Java 1.5, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Familiar with Java virtual machine (JVM) and multi-threaded processing.
- Set up standards and processes for Hadoop based application design and implementation.
- Experience in managing and reviewing Hadoop Log files.
- Used Elastisearch to index, fetch, and filtering of data.
- Worked on AWS by submitting jobs on Ec2 and EMR.
- Working knowledge on Kubernetes.
- Extensive experience with SQL, PL/SQL and database concepts.
- Good knowledge on network protocols, TCP/IP configuration and network architecture.
- Worked on NoSQL databases including HBase, Cassandra.
- Knowledge in job workflow scheduling and monitoring tools like oozie and Zookeeper
- Experience in developing solutions to analyze large data sets efficiently
- Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services.
TECHNICAL SKILLS
Tools: and frameworks: Hive, Sqoop, Pig, Puppet, Ambari, HBase, MongoDB, Cassandra, PowerPivot, Flume, Spark, and Jenkins,vertica
Java & J2EE Technologies: Core Java 1.5,Servlets 2.4
Operating Systems: Windows 95/98/2000/XP/Vista/7/8, Unix, Linux, Solaris
IDE Tools: Eclipse 3.2.2,Net Beans 6.1,RSA, RAD, Oracle Web logic workshop
Methodologies: Agile/ Scrum, Waterfall
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL
Programming or Scripting Languages: C, Java, SQL, Unix Shell Scripting, Python, SCALA
Database: Oracle 11g 10g 9i, MySQL, NoSQL
PROFESSIONAL EXPERIENCE
Confidential, Dallas -TX
Hadoop Developer
Responsibilities:
- Participated in requirement gathering and converting the requirements into technical specifications
- Analyzed large data sets by running Hive queries and Pig scripts
- Analyze log files through hive and loading Json format to hive, and worked on external and internal tables and hive optimization techniques.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Involved in scrum meetings and worked in agile methodology.
- Worked on Spark broad cast variables and worked on joins RDD.
- Used various transformations in development of spark code and worked on performance tuning of spark applications.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Worked with integration spark with Kafka by creating producer objects and sending data from producer to Kafka clusters.
- Stored data in the form of Avro format Parquet.
- Used elastisearch to download data and ingested data
- Used kafkutils to extract data from spark.
- Worked on code review and testing of developed spark code using Scala.
- Implemented spark applications in both local mode for review and distributed mode.
- Analyzed airline data using spark and flume and worked on log files
- Worked on scripting using chef and puppet.
- Worked on webserver logs and ingested data using flume to hdfs and worked on analysis using spark.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Responsible to oversee and write scripts for data to be processed to get ready for the analysts.
- Worked on Hbase database creation and data insertion from pig to hbase and hive.
- Worked on performance tuning in hbase.
- Submitted MR jobs on EMR cluster.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible for loading data from LINUX file systems to HDFS.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Liased with various technical teams to resolve issues.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
Environment: Hadoop, Java (jdk1.6), Hive, Pig, Sqoop, MapReduce, Flat files, Oracle 11g/10g, MySQL, Linux, Spark, AWS, Hortonworks HDP 2.5,Vertica
Confidential, Boston MA
Hadoop developer/Admin
Responsibilities:
- Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Zookeeper and Sqoop.
- Developed shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in collecting and aggregating large amounts of log data and staging data in HBASE/HDFS for further analysis.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Involved in writing shell scripts for rolling day-to-day processes and its automation.
- Performed data analysis, querying on hive, pig on Cloudera Distributed Hadoop (CDH)
- Transformed massive amounts of raw data into actionable analytics include financial data analysis, market, and product by the use of BI tools.
ENVIRONMENT: Hadoop, Map Reduce, Hue, Hive, HDFS, PIG, Sqoop, Cloudera, ZooKeeper, CDH4&CDH5, Oracle, PL/SQL, Linux, Tableau
Confidential
Hadoop/Big Data Analyst
Responsibilities:
- Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
- Involved in data ingestion into HDFS using Sqoop from variety of sources using the connectors like jdbc and import parameters.
- Responsible for managing data from various sources and their metadata.
- Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
- Installed and configured Hive and wrote Hive UDF’s that helped spot market trends.
- Used Hadoop streaming to process terabytes data in XML format.
- Hive Queries in Spark-SQL for analysis and processing the data. Used Scala programming to perform transformations and applying business logic.
- Implemented Partitioning, Dynamic Partition, Indexing and buckets in Hive.
- Loaded the dataset into Hive for ETL Operation.
- Stored processed data in parquet file format.
- Streamed data from data source using Flume.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
- Worked with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark Streaming.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run Map Reduce jobs in the backend.
Confidential
Java Developer
Responsibilities:
- Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
- Designed and developed user interface using JSP, HTML and JavaScript.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Validated the fields of user registration screen and login screen by writing JavaScript validations.
- Used DAO and JDBC for database access.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in postproduction support and maintenance of the application.
- Involved in the analysis, design, implementation, and testing of the project.
- Implemented the presentation layer with HTML, XHTML and JavaScript.
Environment: Java 1.5, Oracle 11g, HTML, XML, SQL, J2EE, JUnit