Sr. Hadoop Developer Resume
Houston, TX
PROFESSIONAL SUMMARY:
- 9+ years of experience in IT industry with extensive experience in Java, J2ee and Big data Technologies.
- 4+ years working of exclusive experience on Big Data Technologies and Hadoop stack
- Strong experience working with HDFS, Map reduce, Spark, Splunk, Hive, Pig, Sqoop, Flume, Kafka, Oozie and HBase.
- Good understanding of distributed systems, HDFS architecture, Internal working details of MapReduce and Spark processing frameworks.
- Good experience in implementing end to end Data Security and Governance within Hadoop platform using Apache Knox, Apache Sentry, Kerberos etc.,
- More than one year of hands on experience using Spark framework with Scala.
- Good exposure to performance tuning hive queries, map reduce jobs, spark jobs.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files
- Has good understanding of various compression techniques used in Hadoop processing like Gzip, SNAPPY, LZO etc.,
- Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache SQOOP.
- Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them.
- Extensively worked on HiveQL, join operations, writing custom UDF’s and having good experience in optimizing Hive Queries.
- Worked on various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to implement and make use of those.
- Mastered in using different columnar file formats like RC, ORC and Parquet formats.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
- Good experience in optimizing Map - Reduce algorithms by using Combiners and Custom partitioners.
- Hands on experience in NOSQL databases like HBase and MongoDB.
- Expertise in back-end/server-side java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC)
- Experience includes application development in Java (client/server), JSP, Servlet programming, Enterprise Java Beans, Struts, JSF, JDBC, spring, Spring Integration, Hibernate.
- Very good understanding in AGILE scrum process.
- Experience in using version control tools like Bit-Bucket, SVN etc.
- Having good knowledge of Oracle 8i, 9i, 10g as Database and excellent in writing the SQL queries
- Performed performance tuning and productivity improvement activities
- Extensively use of use case diagrams, use case model, sequence diagrams using rational rose.
- Proactive in time management and problem-solving skills, self-motivated and good analytical skills.
- Have analytical and organizational skills with the ability to multitask and meet the deadlines.
- Excellent interpersonal skills in areas such as teamwork, communication and presentation to business users or management teams.
TECHNICAL SKILLS:
Big Data Ecosystems: Hadoop, Splunk, IBM QRadar, Teradata, Map Reduce, Spark, HDFS, HBase, Pig, Hive, Sqoop, Oozie, Storm, Kafka and Flume.
Streaming Technologies: Spark Streaming, Storm
Scripting Languages: Python, Bash, Java Scripting, HTML5, CSS3
Programming Languages: Java, Scala, SQL, PL/SQL
Databases: RDBMS, NoSQL, Oracle.
Java/J2EE Technologies: Servlets, JSP (EL, JSTL, Custom Tags), JSF, Apache Struts, Junit, Hibernate 3.x, Log4J Java Beans, EJB 2.0/3.0, JDBC, RMI, JMS, JNDI.
Tools: Eclipse, Maven, Ant, MS Visual Studio, Net Beans
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Houston, TX
Sr. Hadoop Developer
Responsibilities:
- Orchestrating the automatic routine acquisition of Cyber security logs and supporting data
- Developed multiple Extract, Transform and load functionalities with Pentaho Data Integration tool
- Implemented data ingestion, ETL, and stream processing pipelines to support the cyber security data scientists.
- Worked on Splunk with SPL query language to create various alerts that are sent to the various teams who consume it and perform necessary actions.
- Analyze and design existing systems to evaluate effectiveness and develop new systems to improve production of workflow.
- Involved in configuring Sqoop and flume to extract/export data from IBM QRadar and MySQL.
- Developed Spark applications for Data validation, cleansing, transformations and custom aggregations.
- Involved in System Analysis, Design, Coding, Data conversion, Development and Implementation.
- Involved in enhancement to the existing application and create new applications.
- Involved in retrieving the data from Hadoop HDFS according to requirements by writing scripts in Java (Map reduce), Pig, Hive.
- Used JIRA as a work tracking tool with Kanban board integrated with Confluence and Bitbucket as a version control system.
- Used Java extensively for writing map reduce programs in Hadoop.
- Bulk Loaded data from several databases into Hadoop using Sqoop.
- Written logic using pig and hive for retrieving data from Hadoop in required format.
- Involved in Performance Tuning for optimizing the jobs.
- Hands on experience in working with Cloudera distributions.
- Hands on experience on extracting data from different databases and scheduled oozie workflow to execute this job daily.
- Expertise in data loading techniques like SQOOP. Performed transformations of data using Hive, Pig according to business requirements to HDFS for aggregations.
- ETL development to normalize this data and publish it in Impala
- Developing stream data processing pipelines to automate the running and training of machine learning models inside Hadoop.
Environment: Hadoop, Splunk, IntelliJ, QRadar, HDFS, Map Reduce, Spark, Scala, HIVE, Pig, Sqoop, HBase, Oozie, MySQL, Bitbucket, Putty, Zookeeper, UNIX, Shell scripting, JavaScript, XML, HTML, Python and Bash.
Confidential, Austin, TX
Hadoop Developer
Responsibilities:
- Create batch and real time pipelines using Spark as the main processing framework.
- Extensively worked on HIVE, created numerous Internal and external tables as part of requirements.
- Work closely with business, transforming business requirements to technical requirements.
- Been part of Design Reviews & Daily Project Scrums.
- Written custom UDF’s in HIVE according to business requirements.
- Hands on experience in working with Cloudera distributions.
- Hands on experience on extracting data from different databases and scheduled oozie workflow to execute this job daily.
- Expertise in data loading techniques like Sqoop.
- Performed transformations of data using Spark and Hive according to business requirements for generating various analytical datasets.
- Experience in writing Spark applications for Data validation, cleansing, transformations and custom aggregations.
- Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE.
- Experience in using Oozie for workflow design and Oozie coordinator for scheduling workflows.
- Partitioned and bucketing the hive tables for better querying.
- Used Spark Dataframes, Spark-SQL, Spark MLLib extensively.
- Knowledge in installing and configuring various services on the Hadoop cluster.
- Having daily scrum calls on status of the deliverables with business users.
- Communicate deliverables status to user/stakeholders, client and drive periodic review meetings.
- On time completion of tasks and the project per quality goals.
Environment: Hadoop, HDFS, Map Reduce, HIVE, Pig, Sqoop, HBase, Oozie, MySQL, SVN, Putty, Zookeeper, UNIX, Shell scripting, JSP & Servlets, PHP, JavaScript, XML, HTML, Python and Bash.
Confidential, San Francisco, CA
Hadoop Developer
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Stored the processed data by using low level Java API’s to ingest data directly to HBase and HDFS.
- Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Worked on installing cluster, commissioning & decommissioning of Data node, Name node high availability, capacity planning, and slots configuration.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
- Experience in managing and reviewing Hadoop log files.
- Experience in hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
- Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
- Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters.
- Installed and configured various components of Hadoop ecosystem.
- Optimized HIVE analytics SQL queries, created tables/views, written custom UDFs and Hive based exception processing.
- Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
- Replaced default Derby metadata storage system for Hive with MySQL system.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
- Configured Fair Scheduler to provide fair resources to all the applications across the cluster.
Environment: Cloudera 5.4, Cloudera Manager, Hue, Kafka, HBase, HDFS, Hive, Pig, Sqoop, Kafka, MapReduce, DataStax, IBM DataStage 8.1(Designer, Director, Administrator), Flat files, Oracle 11g/10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting.
Confidential, Conway, AR
Hadoop Developer
Responsibilities:
- Loading the data from the different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Created different pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.
- Implemented various Hive queries for Analysis and call then from java client engine to run on different nodes.
- Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Involved in End to End implementation of ETL logic.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Developed bash scripts to bring the log files from FTP server and then processing it to load into Hive tables.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
- Developed Map Reduce programs for applying business rules to the data.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Maintaining Authentication module to support Kerberos.
- Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
- Implemented the part to resolve issues related with old Hazel cast API Entry Processor.
- Participated with the admin team in designing and upgrading CDH 3 to HDP 4.
- Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit.
- Enhanced existing module written in python scripts.
- Used dashboard tools like Tableau.
Environment: Hadoop, Linux, MapReduce, HDFS, HBase, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, Oracle 10g, Maven, Open source technologies Apache Kafka, Apache Spark, ETL, Hazel cast, Git, Mockito, python.
Confidential
Software Developer
Responsibilities :
- Troubleshoot and resolve complex DoubleClick for Publishers (DFP) issues directly for our highest revenue-generating publishers.
- Provide guidance and consultative technical expertise to front line General Support.
- Provide two-way communication and collaboration on monthly product release cycles and customer-centric product development.
- Collaborate with US/EMEA teams.
- Serve as point of contact for projects led by our escalation management team to improve client experience and internal operational effectiveness.
- We altogether handle the issues like ad is not showing as intended (geotargeting not working, etc), Clicks are not tracking, Impressions and clicks are not matching with 3rd party Ad server numbers (discrepancies), Enabling features, etc., Web page tagging issues.
- Actively participating in the code reviews, meetings and solving any technical issues.
- Work cooperatively with others and take necessary steps to ensure successful project execution using strong verbal communication skills.
- Written and executed the Test Scripts using JUNIT
- Actively involved in system testing.
- Developed XML parsing tool for regression testing.
Environment: Confidential DFP, Java, Java Script, HTML, XML, jQuery, CSS, JDK 1.5.1, JDBC, Oracle10g, Unix.
Confidential
Java Developer
Responsibilities:
- Worked as a Java developer for a New Zealand based client “Enable Networks”.
- Trained for COTS Products “Siebel CRM” & “Comptel Instant Link” and achieved expertise in quick time.
- Developed Network Element Interfaces such as Notification Service, Email Notification Service, NAV (Billing), GIS Interface Integration, Huawei U2000.
- Lead client meetings every week regarding the updates and modifications in the developed code.
- Every Day Billing Activities which involves Billing of customers of specific billing cycle which makes customers to get the new bundle of plans and closing the previous plan.
- If required, need to automate specific task by creating a shell script
- If required, schedule walkthrough call with the onsite team.
- Will perform Adhoc billing if billing of certain customer didn’t happen because of some errors in the system.
- Involved in development of General Ledger module, which streamlines analysis, reporting and recording of accounting information. General Ledger automatically integrates with a powerful spreadsheet solution for budgeting, comparative analysis and tracking facility information for flexible reporting.
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Developed various EJBs for handling business logic and data manipulations from database.
- Involved in design of JSP’s and Servlets for navigation among the modules.
- Designed cascading style sheets and XML part of Order Entry Module & Product Search Module and did client-side validations with java script.
Environment: JAVA 1.5, J2EE, JSP, Servlets, JSTL, JDBC, Struts, ANT, XML, HTML, JavaScript, SQL, Oracle 9i, Spring 2.0, Hibernate 2.0, Log4j, WebLogic 8.1, Unix.