Hadoop Spark Developer Resume
Minneapolis, MN
SUMMARY:
- Professional experience of 8+ years in IT which includes 3+ years of comprehensive experience in working with Apache Hadoop Ecosystem - components, Spark streaming .
- Over 3+ Years of development experience in Big Data Hadoop Ecosystem components and related tools with data ingestion, importing, exporting, storage, querying, pre-processing and analysing of big data.
- Good Working Expertise on handling Terabytes of structured and unstructured data on huge Cluster environment.
- Experience in using SDLC methodologies like Waterfall, Agile Scrum, and TDD for design and development.
- Expertise in implementing Spark modules and tuning its performance.
- ExperiencedinperformancetuningofSparkapplicationsusingvarious resourceallocation techniquesand transformations reducing the Shuffles and increasing the Data Locality configurations.
- Expertise in Kerberos Security Implementation and securing the cluster.
- Expertise in creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in Havel and experience in data transformation & file processing, building analytics using Pig Latin Scripts.
- Expertise in writing custom UDFs in Pig & Hive Core Functionality.
- Developed, deployed and supported several Map Reduce applications in Java to handle different types of data.
- Worked with various compression techniques like Avro, Snappy, and LZO.
- Hands on experience dealing with AVRO and Parquet file format, following best Practices and improving the performance using Partitioning, Bucketing, and Map side-joins and creating Indexes .
- Experience in Data Load Management, importing and exporting data from HDFS to Relational and non- Relational Database Systems using Sqoop, Flume and Apache Nifi by efficient column mappings and maintaining the uniformity .
- Exported data to various Databases like Teradata (Sales Data Warehouse), SQL-Server, Cassandra using Sqoop.
- Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the HDFS.
- Experienced in performing code reviews, involved closely in smoke testing sessions, retrospective sessions.
- Experience in scheduling and monitoring jobs using Oozie and Crontab.
- Experienced in Microsoft Business Intelligence tools, developing SSIS (Integration Service), SSAS (Analysis Service)and SSRS (Reporting Service) , building Key Performance Indicators and OLAP cubes.
- Hands on working with the reporting tool Tableau, creating dashboards attractive dashboards and worksheets.
- Experience in JAVA, J2EE, WEB SERVICES, SOAP, HTML and XML related technologies.
- Have closely worked with the technical teams, business teams and product owners.
- Strong analytical and problem-solving skills and ability to follow through with projects from inception to completion.
- Ability to work effectively in cross-functional team environments, excellent communication and interpersonal skills.
TECHNICAL SKILLS:
Hadoop/BigData Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Spark and Kafka
No SQL Database: HBase
Monitoring and Reporting: Tableau, Custom Shell Scripts
Hadoop Distribution: Horton Works, Cloudera, MapR
Build Tools: Maven, SQL Developer
Programming and Scripting: Java, C, C++,JavaScript, Shell Scripting, Python, Scala, Pig Latin, HiveQL
Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/REST services
Databases: Oracle, MY SQL, MS SQL server, Vertica, Teradata
Analytics Tools: Tableau, Microsoft SSIS, SSAS and SSRS
Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript
IDE Dev. Tools: Eclipse 3.5, Net Beans, Oracle, JDeveloper 10.1.3, SOAP UI, Ant, Maven, RAD
Operating Systems: Linux, Unix, Windows 8, Windows 7, Windows Server 2008/2003
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Ambary, Storm, Spark and Kafka, Apache Nifi
Network protocols: TCP/IP, UDP, HTTP, DNS, DHCP
PROFESSIONAL EXPERIENCE:
Confidential, Minneapolis, MN
Hadoop Spark Developer
Responsibilities:
- Worked on different file formats like Sequence, XML, JSONfiles and Map files using Map ReducePrograms.
- Took important decisions of how much the poll time should be for the stream processing, what type of Hadoopstack component to use for better performance.
- Proposed and implemented a solution for their long-time issue of ordering the data in Kafka queues.
- Designed and implemented an ETL framework with the help of sqoop, pig and hive to be able to automate the process of frequently bringing in data from the source and make it available for consumption.
- Worked onimporting and exporting data into HDFS and Hive using Sqoop, built analytics on Hive tables using HiveContext.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Load and transform large sets of semi-structured and unstructureddata on HBase and Hive.
- ImplementedMap-ReduceprogrammingmodelwithXML, JSON,and CSVfileformats.Made use of SERDE jars to load json and xml format dataonto Hive tables coming from Kafka queues.
- Implemented UDFs for Hive extending Generic UDF, UDTF and UDAF base classes to change the time zones implement logic actions and extract required parameters according to the business specification.
- Extensive working knowledge of Partitioning , UDFs , Performance tuning, Compression -related properties on Hive tables.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Implemented Spark scripts in Python to perform extraction of required data from the data sets and storing it on HDFS.
- Developed spark scripts and python functions that involve performing transformations and actions on data sets.
- Configuring Spark Streaming in Python to receive realtime data from the Kafka and store it onto HDFS.
- Involved in optimizing the Hive queries using Map-side join, Partitioning, Bucketing and Indexing.
- Involved in tuning the Spark modules with various memory and resource allocation parameters, setting right Batch Interval time and varying the number of executorsto meet the increasing load overtime.
- Continuously monitored and managed the Hadoop cluster using Cloudera Manager.
- Used Hue for UI based PIG script execution, Tidal scheduling and creating tables in Hive.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Involved in planning process of iterations under the Agile Scrum methodology.
- Extensive knowledge of working on Apache NiFi, used and configured of different processors to pre-process, make the incoming data uniform and format according to the requirement.
Environment: Cloudera CDH 5.7, Apache Hadoop 2.6.0 (Yarn), Spark 2.1.0,Spark.ml, Flume 1.7.0, Eclipse, Map Reduce, Hive 1.2.2, SQL, Sqoop 1.4.6, Zookeeper 3.5.0 and NOSQL database, Apache Nifi, AWS, S3, EMR
Confidential, San Antonio, TX
Hadoop Developer
Responsibilities:
- Worked on extracting data from Oracle database and load to Hive database.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly from Kafkaqueues in real time and persist on Cassandrausing the required connectors and drivers.
- Integrated Kafka, Spark and Cassandra for streamline analytics for creating a predictive model.
- Developed Scala scripts, UDFs using both Data frames in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Worked on modifying and executing the UNIX shell scripts files for processing data and loading to HDFS.
- Worked extensively on optimizing transformations for betterperformance.
- Was involved in carrying out the important design decisions in creating UDFs, partitioning the data in hive tables at two different levels based on the related columns for efficient retrieval and processing of queries.
- Tweaked lot of options to get performance boost like trying it out with different executer count and memoryoptions.
- My team was also involved in maintenance, adding the feature of stable time zones across all records in the database.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured, heterogeneous sources into HDFS file systemusingSqoop and Flumeenforcing and maintaining the uniformity across all the tables.
- Developed complex transformations using HiveQL to build aggregate/summarytables.
- Implemented unit tests for pig and hive applications.
- Developed UDF's in Python to implement functions according to the specifications.
- Developed Spark scripts, configured according to business logic, good knowledge of actionsavailable.
- Well versed with the HL7 international standards as the data was organized according to this format.
- Formatted and built analytics on top of the data sets that were complied with HL7 standards.
- Created UDFs in scala, java for formatting and applying transformations on the information in HL7 versions.
- Analyze the JSON data using hive SerDe API to deserialize and convert into readableformat.
- Involved in increasing and optimizing the performance of the application using Partitioning and Bucketing on Hive tables, developing efficient queries by using Map-side joins and Indexes.
- Worked with downstream team in generating the reports on Tableau.
Environment: CDH 5.1.x, Hadoop 2.2.0, HDFS, Map Reduce, Sqoop, Flume, Hive 2.0.x, SQL Server, TOAD, Oracle, Scala 2.9.1, PL/SQL, Eclipse, JAVA, Shell scripting, Unix, Cassandra, HL7 standard. onducted code reviews to ensure systems operations.
Confidential, Bloomington, IL
Jr.Hadoop Developer
Responsibilities:
- Developed MapReduce programs in Python using Hadoop streaming API to parse the raw data, populate staging tables and store therefined data in partitionedHIVE tables.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop DistributedFile System and Pig to pre-process thedata.
- Convertedapplications which was on map-reduce architecture to Spark using Python API which performed thebusinesslogic.
- Involved in creating Hive tables, loading with data, writing hive queries that will run internally in map reduce way.
- Imported Teradata datasets onto the HIVE platform using Teradata JDBC connectors.
- Was involved in writing FastLoad and MultiLoad scripts to load the tables
- Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
- Developed Hive jobs to parse the logs, structure them in tabular format to facilitate effective querying on the log data.
- Extensively used UNIX for shell Scripting and pulling the Logs from the Server.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Completed testing of integration and tracked and solved defects.
- Used AWS services like EC2 and S3 for small data sets.
Environment: Hadoop Hortonworks2.2, Hive, Pig, HBASE, Sqoop and Flume, Oozie, AWS, S3, EC2, EMR Spring,Kafka,SQL Assistant, Python,UNIX, Teradata Involved in loading data from UNIX file system to HDFS
Confidential, Dyersburg, TN
Java Developer
Responsibilities:
- Implemented the online application by using Core Java, Jdbc, JSP, Servlets and EJB 1.1, WebServices, SOAP,WSDL
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements for the front-end modules.
- Performed live demos of functional and technical reviews.
- Maintenance in the testing team for System testing/Integration/UAT.
- Guaranteeing quality in the deliverables to the product owners and business team.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Implemented Action Classes and Server-side validations for account activity, registration and Transaction’s history.
- Designed user-friendly GUI interface and Web pages using HTML, CSS, Struts, JSP.
- Involved in writing Client-Side Scripts using Java Scripts and Server-Side scripts using Java Beans. Environment: JSP, Java Script, XML, HTML, UML, Apache Tomcat, JDBC, Oracle, SQL, Log4j. AIG Life Insurance, Jersy City,NY Web Developer/ Java developer Responsibilities:
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class models, Sequence diagrams, and Activity diagrams for SDLC process of theapplication.
- Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript,AJAX
- Configured the project on WebSphere 6.1 applicationservers
- Implemented the online application by using Core Java, Jdbc, JSP, Servlets and EJB 1.1, WebServices, SOAP,WSDL
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements for the front-end modules.
- Performed live demos of functional and technical reviews.
- Maintenance in the testing team for System testing/Integration/UAT.
- Guaranteeing quality in the deliverables to the product owners and business team.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Designed user-friendly GUI interface and Web pages using HTML, CSS, Struts, JSP.
- Involved in writing Client-Side Scripts using Java Scripts and Server-Side scripts using Java Beans.
Environment: JDK 1.5, JSP, WebSphere, JDBC, EJB2.0, XML, DOM, SAX, XSLT, CSS, HTML, JNDI, WEB SERVICES, WSDL, SOAP, RAD, PL/SQL, JavaScript, DHTML, XHTML, JavaMail,PL/SQL DEVELOPER, TOAD, POI REPORTS, WINDOWS XP, RED HAT LINUX
Confidential, Chicago, IL
Web Developer
Responsibilities:
- Involved in various stages of Enhancements in the Application by doing the required analysis, development, andtesting.
- Prepared the High and Low-level design document and Generating DigitalSignature
- For analysis and design of application created Use Cases, Class and SequenceDiagrams.
- For the registration and validation of the enrolling customer developed logic andcode.
- Developed web-based user interfaces using struts framework.
- Coded and deployed JDBC connectivity in the Servlets to access the Oracle database tables on Tomcat web-server.
- Handled Client-side Validations used JavaScript
- Involved in integration of various Struts actions in theframework.
- Used Validation Framework for Server-sideValidations
- Created test cases for the Unit and Integrationtesting.
Environment: Java Servlets, JSP, Java Script, XML, HTML, UML, Apache Tomcat, JDBC, Oracle, SQL, Log4j Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridgedrive